At VideoAmp, leveraging cloud providers such as AWS has allowed our company to experience massive and immediate growth through fast iteration of development, flexible scale, and readily available services which solve complex infrastructure problems. The availability of this resource was an invaluable tool at startup and key to the initial success of the business. However, as the company grew and our scale reached new heights, the cost to run our infrastructure was growing at a formidable rate. Despite all efforts to control cost while maintaining company growth, it became clear that new options and methods needed to be explored to allow us to lower our growing infrastructure costs.
If you search for the definition of “Cloud Computing” you will most likely find something similar to “the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.” Very early in the advent of computers, the concept of “sharing” or “buying” time on compute systems became an obvious solution to maximize efficiency of idle compute time. Later, this concept grew into a full business model generating a new category of technology business best referred to as “cloud providers.” Cloud computing services have become a common way for companies or individuals to forego the upfront cost and effort of buying dedicated server resources by leasing virtualized server time from a separate entity. Some of the big names in the space today include Amazon Web Services (AWS), Microsoft Azure, IBM Cloud, Google Compute, and many others with various resource and service offerings. The quick setup time and overall convenience of these systems has fueled the growth of technology in the world around us by crushing a major barrier to entry into IT.
Although expedient, the choice to build a business on the platform of a cloud provider does not come without effort and cost along with the many considerations facing a company embarking on a journey of “hosting in the cloud.” First, the skill-set required to run a cloud based model does not directly overlay with comparative skill-sets in the world of networking and data center management. Cloud platforms come with their own unique learning curves for your IT staff. Also, the many choices of providers and methods can be overwhelming. In addition, the choice to host your company’s intellectual property on someone else’s infrastructure can be difficult to digest. The urgency to “just get started” can sometimes shadow the forward-thinking required to develop a proper cloud plan. It is important to consider as many factors as possible to determine the best way to “build for the future.” Things like scale requirements, software choices, managed services, and a variety of other issues, can all have a big impact on your options and cost as you move forward with a cloud provider. In many cases, an upfront and conscious accrual of technical debt is required to get a business off the ground. Said in another way, the future effort is a known concern, however, the immediate need to survive and thrive takes precedence over the future demand.
For us, the decision to “leave the cloud” was not an easy decision. In our infrastructure we run many various microservices, some developed in-house and others from open-source and other types of community projects. We were also leveraging some of the more complex AWS services such as S3 (Simple Storage Service), RDS (Relational Database Service), and ECS (Elastic Container Service) to name a few. From the beginning, we made a conscious decision that anything not requiring the complexities of an AWS provided service should be built on general virtualized machines in the AWS EC2 (Elastic Cloud Compute) service to ensure the ability to relocate the service in the future if needed. This early decision was a key point in our ability to move services when we reached this critical point.
With all of these factors to consider, we began planning what our future infrastructure should look like in the “new model.” From a pure cost standpoint, the decision to “move everything” seemed obvious at first. But after closer consideration the complexities of some systems became apparent and the requirements for running a comparable level of service in our own data centers did not provide a benefit over our provider’s options. Because of this realization, a decision to move toward a “hybrid” compute model was our best path forward. In this case, I am using the word “hybrid” to describe a network model where an infrastructure runs in two places with two separate methods. In our case on one side of the model sits AWS, and on the other, a physical data center with dedicated network and server hardware.
Once we made the plan to move forward with this new model, it was time to get to work. There were many decisions to make:
The list goes on. This process is a pivotal step in the decision to “migrate” and for most companies, a significant cost savings must be available to justify the effort and various organizational requirements. For us, the decision was obvious based on our infrastructure spend combined with our available resources and level of competence to administer this new model. We knew we could not only justify but also successfully implement the change. So from here, the work began and our first systems were installed into our new physical space. At this point, the easy part was complete!
After installing hardware, bringing the network online, and provisioning our new systems for use, we began the effort to “bridge” our environments which is a critical step in providing an infrastructure that can support a successful migration of services. Many cloud providers including AWS offer dedicated services for external network connectivity. By leveraging the AWS Direct Connect service, we were able to configure our two environments with a physical connection between the two to allow seamless connectivity across networks. At this point, we were ready to commence our service migration.
Migrating services can be a daunting task in itself. Startup companies are notorious for their lightning speed in early years. The product goes from an idea to a complex machine in a very short period of time. With this immense growth comes many layers of service, configuration, and other aspects of a working product that all have to be reconsidered and moved to their new home in the new infrastructure model. Doing this — while continuing business growth yet not disrupting the pace of development — presents many challenges for the migration phase. A company must “divide and conquer” with available resources maintaining progress on both fronts. The task is a true team effort and requires work from many parts of the organization.
In our case, we took an iterative approach to the process, moving services in a carefully coordinated order over many months. In AWS we were heavily utilizing Amazon’s “container” services. A large part of our environment was built using Docker containers. The containerization of our infrastructure early-on was a key factor in our ability to migrate quickly and with minimal service disruption. Our biggest hurdle was to find a way to coordinate containers in our new environment. In Amazon the process was easy. Setting up a container deployment for the most part is “point and click” using the Elastic Container Service or ECS. In the physical datacenter, this same level of coordination can be difficult. After careful review and much consideration for our long term goals, we settled on Kubernetes (initially developed by Google and now maintained by the community) to accommodate our container coordination needs. With a new method for managing our containers, we were ready to move.
Aside from container coordination, many other challenges arose, hurdles and blockers were crushed, and progress continued. During the migration, our business was also still growing and changing at a rapid pace. Our scale needs changed, or product offerings evolved, and our business relationships continued to grow, among many other considerations. We even made a scale adjustment midway though the process requiring additional hardware and new technologies to be added to the implementation plan. I cannot say that the process was smooth and easy, but I can say that we are very proud of the efforts of our entire team working together to complete the project.
Moving forward, we continue to utilize AWS to serve specific purposes in our new model. We leverage a wide range of platform services to provide simple solutions to complex infrastructure problems. One good example is our use of “on demand” compute resources to provide flexible scale and rapid development iteration. A specific use-case is how we leverage an in-house developed and now open source technology we call “Flint” (https://github.com/VideoAmp/flint). Flint provides the ability to quickly and easily run automated deployments of Apache Spark clusters using AWS EC2 on both Spot and On-demand instances which provides us with significant capacity as-needed to run large data workflows for short periods of time. This convenience would be much more difficult and expensive to maintain in our physical data center.
In summary, maintaining the convenience of on-demand resources combined with the cost saving of running big, static environments on physical hardware provides us with the ability to control cost without a loss of convenience or ability. The decision to move to this type of model will be different for every company. The timing, scale, current landscape, and other considerations should be carefully considered. For anyone interested in a more direct discussion about this or anything related, please comment below. We would love to hear from you!