Balancing Cost Advantages in the Cloud with the Performance Requirements of Real-Time Workflow

“The cloud” has existed for decades. Remember when the official graphic symbol for the internet was an “evil” cloud? Today, however, cloud is one of the most frequently used buzzwords. So many companies seem to offer a solution with or in the cloud, and it appears that at least one part of many software solutions or hardware set-ups has to be in the cloud somewhere and somehow.

This is an interesting development, considering that, compared to just a few decades ago, the method of collaboration and utilizing both hardware and software has changed fundamentally, moving away from large mainframes from former market leaders such as IBM, SGI, Cray, and others to personal computers. Although PCs have become more powerful and affordable, this approach had two major downsides: post-production facilities require high-end workstations that are powerful enough to handle all the applications as well as provide sufficient render power. In addition, the maintenance effort for all individual workstations is enormous, and keeping all workstations up-to-date with the latest software version is a costly endeavor.

At the end of the 90s, a few companies realized an opportunity to cut expenses for hardware and software updates by centralizing their applications and moving them onto the internet. Without sacrificing functionality, new web-based versions of certain applications became accessible via a generic internet connection, and the idea of an application service provider (ASP) was born.  Similar to a terminal server-based infrastructure, the maintenance burden of every application was taken away from the individual workstations and shifted to a central location — to the cloud.

In 2001, software as a service (SAAS) arrived. Based on the same idea as ASP, SAAS provides a rental model for software whereby the software lives in the cloud. Post-production facilities can access any desired application via the internet. There are  some differences between ASP and SAAS, but basically both solutions go back to the mainframe idea of the 1960s, where large vendors provide the computing or render power and users benefit from the cost savings.

The trend is obvious and the intention to pursue it becomes even clearer when looking at the company talent of organizations that are already big players in the field. These vendors primarily seek personnel to join their team who are familiar with and contribute to solutions around SAAS.

Don’t Buy It – Rent It! Cloud Computing in Post-Production

Most post-production companies work with special application packages, and some of the larger post software vendors have already understood the trend and moved their solutions into the cloud. One well-known example is Adobe, with its CreativeCloud solution, where almost every application package is already available as a cloud based service. This development initiates a whole new sales approach by licensing the cloud instead of selling traditional software licenses. Customers rent the desired application, and it is only available through the cloud. Another large vendor pursuing the trend is Autodesk, which has acquired many of its competitors in the past years and is investing to maintain its market leadership.

The Challenge: High Performance Requirements in Color-Correction

Most color-grading applications from high-end vendors such as Autodesk, BlackMagic, MTI, Image Systems Digital Vision, and others work with uncompressed content. Having the largest amount of data in a single frame guarantees the best possible output to fulfill the high demands of cinema quality. An uncompressed workflow, however, requires quite a large rate of throughput to flawlessly play a sequence of uncompressed frames.

As an example, in a standard application requiring 2K uncompressed playback at 24fps, every single 2K image has a file size of 12.8 MB on average, without audio. Thus, a data transfer rate of about 300 MB/sec is required to seamlessly play a 2K uncompressed sequence. In other words, the system has only 40 milliseconds per frame to seek, find, and transfer the data to the workstation for flawless and seamless playback. Usually, color-grading clients are connected via fibre channel to a DAS or SAN environment so a data transfer rate of 300 MB/sec is easily achievable.

If one were to move only the license of the color-grading application into the cloud while keeping the actual application local, it would be quite simple. Yet that model cannot really be referred to as cloud computing. Truly creating a SAAS based setup by moving the entire color-grading application into the cloud, away from the local workstation, poses some challenges.

Nowadays, every post-production facility has a high-speed internet connection available, but it is still necessary to equip the local workstation with sufficient graphic power to process and display footage at 300 MB/sec. How does the application play together with the expensive graphics card? Eventually it will be possible to offload most of the workload from the local CPU and GPU to the cloud, using smaller and cheaper graphics cards. However, for the time being, certain core modules/libraries will have to live on the local workstation and a certain load of rendering performance will have to stay local in order to provide for a real-time workflow. It is also critical to avoid too much traffic by streaming the content from the workstation to the cloud and back before it can be displayed on the local screen.

Next Steps and Technical Barriers

If the current technical developments proceed accordingly, and eventually all applications are truly cloud-based, what would be the logical next step? Moving storage out of the local office into the cloud as well. Let someone else deal with aging hardware, power, cooling requirements, maintenance issues and, of course, the piling amount of big data, including the daily backup. Aside from security concerns that will be discussed later, this probably sounds like an ideal and almost maintenance-free solution for every IT administrator.

An uncompressed workflow of a 2K stream would require 300 MB/sec, which in internet terms is 2.4Gb/sec. In the last few years, storage vendors providing scale-out NAS solutions have already tackled this particular issue, and some achieved a similar throughput. But in regards to consistency and predictability, this is not necessarily the case.

The culprit? TCP/IP is not designed for predictable high performance. No matter how large the data gets and how fast it might travel, it will be chopped up into 1500 byte to 9600 byte chunks, thus generating a huge maintenance overhead for the protocol. To be able to transfer the content through the internet while writing or playing back, you would end up with a requirement of at least 10 Gbit/sec for the internet connection. This throughput is usually only provided by the internal backbone in the local facility.

You may now be thinking that the days for uncompressed, single-frame-based workflows are soon to be over, as everyone will work with compressed or container formats in the future. And this may be correct — at least partially. Audience expectations of picture quality are actually rising, as the latest blockbuster productions such as The Hobbit (photographed in 5K and at 48fps) show. Also, the fact that movie projectors are advancing further and are now able to handle 8K content is proof that content, especially for movie theaters, will remain high-resolution uncompressed data. Even the consumer electronics industry, with video amplifiers that support 4K passthrough or an upscale to 4K, shows that the audience is not willing to put up with highly compressed content.

Even if compressed formats are used more frequently for a flawless and smooth post-production process, predictable performance is still necessary. A DV50-compressed 1080p25 stream will still take up 288 Mbit/sec plus overhead, and sufficient bandwidth for all the latencies involved is still required. Therefore, even if the internet provider can offer a 288 Mbit high speed connection, the available bandwidth would be sufficient for only one color-correction system working in real time.

Problems with the Cloud as Render Cluster

Using the cloud as a render cluster for the final color-correction rendering puts more pressure on the bandwidth and throughput issues. Since the cloud render cluster will need to first read the source images in order to render the data, the raw images have to be transferred first, rendered, and then sent back. A 90 minute feature film in 2K holds on average 1.7 TB of data. The time required to transfer the data to the cloud render cluster and back would be immense.

A render cluster in the cloud raises additional problems. Being subject to TCP/IP, the data coming back from the cloud will be in random order and out of sequence, as many render clients compute the final image. Each file will arrive randomized and fragmented on the local disk. This is a challenge every standard file system is already suffering from, which results in performance loss due to file fragmentation, randomization and interleaving on the disk. Before the rendered content can be played back in real time and stutter-free, the local administrator has to optimize the content first — a very time-consuming maintenance task, especially in a post-production facility where the average volume size is between 100 TB and 500 TB.

What about Content Security?

It is interesting that many people in the field are saying “let’s put it in the cloud” – regardless of whether it is a simple web presentation, mail server or even crucial data. The term cloud is used now for anything outside the local firewall. Many people do not seem to realize that the content they put in the cloud actually becomes somewhat public. There is a legitimate, reasonable concern when it comes to data security and backup on the cloud.

Backup does not necessarily need to be a large concern, as most cloud-storage vendors provide backup services for additional fees. Data security, however, is a completely different challenge, especially in the movie industry. Production companies, film distributors and pretty much the entire movie industry already have a fight against piracy. This serious and continuous problem might even get worse if post outsources certain processes to the cloud, regardless of whether it happens in the course of the original post-production process or for further distribution on DVD and other formats.

While popular cloud applications like Dropbox and others make it easier to share data around the world, it becomes a headache for local IT and a nightmare for the legal department. Where is the content, really? These are justifiable concerns and exactly the questions raised by many production companies. Also, this is a common reason why many post-production customers who don’t like the idea that their content is in the cloud will not contract with a post-production facility that utilizes some sort of cloud storage, cloud computing or even Dropbox. As a side note, the legal status for most countries providing a cloud service right now states that the owner of the content has to ensure that the data is healthy, and there is, basically, no liability for the vendor.

There is also the option of a “private cloud,” where every dedicated email server is in the cloud, yet the owner of the email server has sole access and can enable firewall rules to increase data security. As an example, BBC and Red Bull have their own private clouds, which are provided by Aframe as an infrastructure as a service (IAAS). This service includes special features such as file encryption, replication and sharing over a secured protocol. This concept is intended for collaboration in lower resolutions as it is meant to simpligy collaboration between editors and directors, and if necessary, allow sharing internationally.

Most likely, lossless compression and container formats will change the workflow in the near future and the infrastructure will continue to get faster. However, large media organizations, content distributors and production companies may not change their mind about seeing or finding their content prior to a release in the cloud.  The same goes for car companies and their new car designs, such as GMC, Ford or Porsche.

Potential Cost Savings with Cloud Computing

Aside from the data security topic, there is great value for companies leveraging the cloud.

Great cost-saving potential lies in outsourcing render capacity, as suggested by the Amazon EC2 service (Amazon’s Elastic Renderfarm). This service provides the option to customize a render farm for a relatively small amount of money. The pioneer using this cloud service for a full production is Atomic Fiction in the UK.

Another opportunity to decrease costs with the cloud is the rent-an-application model. A rental license for an application has to be priced a lot lower than the official purchase price of the same software package. Why? It is only rented and not owned by the customer. A license pool can be increased for a few weeks or a month instead of purchasing all the licenses for a lifetime. Certainly many companies will see an advantage using SAAS, and they might rather pay a monthly license fee instead of buying the software. The software turns into an expense, instead of an asset. The customer would also be released from the maintenance and update tasks for new versions.

Although, as probably most of us have learned the hard way, the newer software version is not always better, and in special cases you might have preferred to continue working with the previous version. In addition, if a software patch is rolled out to all the current versions, it will immediately affect a few hundred end users.

The issue of power outages can become very harmful for a facility working with SAAS or IAAS infrastructure, as everything is dependent on the internet connection.  If the internet gets unplugged, all work processes immediately come to a halt. Even Amazon’s S3 cloud failed once, as did Google’s cloud in 2011. Poor weather conditions led to fallouts at Microsoft’s cloud. These examples prove that even large providers cannot guarantee an uptime of more than 99.999% (which is a calculated downtime of five minutes per year, after all).  This doesn’t necessarily have to be a total failure of the cloud or the internet connection. Another example is what happened in the Azure cloud when the SSL certificate became invalid, making the entire service completely unusable.

Conclusion

In summary, all discussed services — ASP, SAAS, and IAAS — offer interesting opportunities to either significantly decrease costs or simplify particular tasks such as maintenance. However, the pros and cons have to be weighed up wisely, as certain downsides can potentially cripple an entire facility.

Bottom line for the time being? With all available high-speed internet connections, lossless compression formats, and so on, it is still necessary to maintain a local high-speed SAN infrastructure of some kind.

The ideal strategy to increase data security with regards to access is to choose a provider that has more than one datacenter available so that the data is always available in more than one physical location.