Or, One Way to Torture-Test a SAN

Toronto’s FrameBlender is a three-year-old Toronto production and post facility that went all-HD in 2006. It wasn’t a difficult transition, but the company decided it need to upgrade its pipeline from a more ramshackle approach involving a mix of different technologies to a disciplined SAN solution – a Terrablock 24D with 9.6 TB of storage capacity.
Facilis Technology VP Jim McKenna described the advantages of Terrablock’s system for FrameBlender this way: “When you’re working with many streams of high-bandwidth media, whether it be DVCPRO HD or Apple’s ProRes codec, it becomes very taxing on a shared-storage system,” he said. “Many SANs will have you hold on to local storage because it’s the only way to have that many streams of high-bandwidth material.

“Terrablock is direct-attached storage, 4 Gig fiber, and a server-direct system, meaning you have block-level access. Making every volume look like a local SCSI drive does more than ensure compatibility and simplicity. It has very high bandwidth because there is very little interference between the client and the block-level of the storage that has been virtualized in the storage system. Each client believes it owns that particular SCSI drive – and it’s going to hammer it as hard as it can.”

Film & Video talked to FrameBlender founding partner Tim Martin about what decisions were made, and why.

F&V: You’ve started doing your own production, as well as post. What are your production capabilities like?

TIM MARTIN: We don’t have any studio space, so all of our productions are shot outside of our location. They range from lifestyle programming to eight-camera shoots for major international artists to single-camera dramatic work.

What cameras do you have?

We have [Panasonic] HVX200s in house, but we’ve shot on everything. We did a film with Sony’s Z1U, and we’ve done an eight-camera Sony F-950 shoot. And obviously we have [Red Digital Cinema] Reds on order.

Can you explain why you decided you had to pull the trigger on a SAN, and describe how you made those decisions?

Last summer, we were at four edit suites and we had three fully populated [Apple] Xserve RAIDs, so we were hovering around 10 TB of fibre storage across those suites. That allowed us to do up to 10-bit uncompressed HD onlines. And we were doing a lot of our offlines based off FireWire 800 drives, so we were kind of a Sneakernet facility, with drives moving from suite to suite. We had 30 drives, and we were managing what was on every drive – we had to continually catalog them. Is that backed up? Where are the source tapes? That kind of fun stuff.

You were manually tracking everything in your libraries.

Yep. Unfortunately, that still exists. But we were finding larger jobs where we needed to ensure that multiple editors and assistants could collaborate simultaneously in multiple rooms, and also jobs that needed such a fast turnaround that we needed to ensure editors could have access to a pool of data without having to worry about, say, cloning one FireWire drive to eight others. We needed something that was going to be very simple to maintain, worked, and had a high percentage of uptime. We already had Xserves and Xserve RAIDs, so we had most of the components to make it pretty simple to install an Xsan. But when we were evaluating SAN solutions, we were happy and excited that Facilis was offering a product that was 4-Gig fiber. Also, we did not want to have someone administrating the SAN all the time. We’ve been up a year now, and the only administration that had to occur is formatting a drive, and we actually did one software update in that time. Other than that, it’s just chugged along for us.

What exactly is the SAN hardware?

We’re using the TerraBlock 24D. I think we’re at 9.6 TB, and when we bought it, it was the highest capacity available. The system itself has two Atto Technology cards. We’re using a QLogic 4-Gig fibre-channel switch and all of the G5s that it’s connected to are using Atto 4-Gig fibre cards, and all of them are running some version of [AJA] Kona card: Kona 3, Kona LH, or Kona LHE.

How have you configured your storage?

Well, our projects begin and end in the same place. We have a centralized machine room with a render farm. So we’ve created four very large, 2 TB pools of storage that hold all the raw footage files – anything that’s been ingested in the machine room – and also for writing render files and making outputs. And then there are a lot of small, 150 GB volumes of data that act as the scratch disks that store information for individual edit suites. All the edit suites have RAID access to those large pools of data, and then write temporary files to a small volume that’s constantly erased.

Depending on the importance of the project, we’ll switch between RAID 5 and RAID 1. We always have our scratch disks for the individual suites, storing the project files, will always be set to RAID 5. It’s not a huge tax on the system. Typically the 2 TB volume pools we’ll keep at RAID 0 or RAID 1. We can kinda get away with that because we know we can be back up in six hours if we lost everything. We have an LTO-III autoloader, so we can walk away and seven hours later have 3.6 TB back up.

Most of our work is done in a tapeless workflow. Even if we’re shooting with the [Grass Valley] Viper or [Sony] XDCAM, a lot of it is recorded to data. All of that is backed up and stored on LTO-3 tapes. So we’re constantly writing and reading LTO-3 tapes.

So what kind of data-recording system would you use on a digital shoot?

Well, for an F950 it might be a portable edit suite. We have suites that we can actually road-case and ship. It depends on what the shoot is. When we’re shooting eight cameras, we’re shooting to tape. But if it’s a very controlled commercial or special-effects shoot, we might shoot to data just to be able to do comp plates on set.

How many projects could you be working on at one time?

As little as one, and as many as 30.

How close do you come to using all the capacity you have?

We’re usually running about 9.2 of our 9.6 TB. We’re usually within about 250 GB of being completely full across all of our storage.

Do you plan on expanding?

Right now we’re at about 42 TB. When the shit hits the fan, we run out and buy another drive. So our storage pool is kind of growing organically. At a certain point, yes, we do plan to – not just for the amount of storage, but for the bandwidth benefits – add to our storage pool by buying a second TerraBlock unit.

What kind of situation taxes your system the most?

Last week, we did a concert for Chris Cornell. It was a project being done with Sympatico MSN in Canada and Molson Canadian. We had to shoot the show at the Commodore Ballroom in Vancouver on Monday night and deliver the show by Friday at noon. We were shooting with eight HVX200s, so we were dealing with data. We had to somehow port the data from Vancouver, send it onto a SAN, and get a two-hour concert cut in that time frame. So we had seven edit systems going double shifts, straight from Tuesday morning when the DP flew into town until Friday at noon. There was no time for anything other than editing, so everything had to be done once and happen correctly for us to make the deadline.

What format were you shooting?

It was 720p, 23.98 DVCPRO HD. The footage came to about 350 GB.

How did you organize the workload?

After the data got into Toronto and was transferred into the SAN, the senior editor cut up the entire concert into songs and then fed projects out to each of the other editors, with their individual tracks. It’s a very simple project, and if you were doing it across four weeks it would be business as usual. But putting a system under the pressure of everyone working very fast and pushing a lot of data back and forth – 350 GB isn’t a lot of footage, but being pushed out to seven different suites simultaneously, it is. And one of the issues is that other things are going on. The system was not solely being used for Chris Cornell – we were also onlining a documentary.

How stable has the system been?

We’ve never been able to, based on people pulling data, get the system to crash. The only problem we’ve had is because we’re dealing with a lot of MXF files, which obviously are composed of a lot of little files in a data structure. We once tried to delete 80,000 files at once off of a volume – without formatting it, just deleting the files. I think it came down to an issue with the Atto card. But we don’t often have to mass-delete that many files.