Making a 2D Image Look Like 3D - No Glasses Required

Is it possible to create a 3D image live, using a single camera? In a word, no – there are complicated post-production techniques that can make a flat image stereoscopic, but they don’t work in camera. But a company called V3 has spent the last 12 years perfecting a process for adding a compelling sense of depth to 2D images during production by shifting parallax inside a standard lens. In other words, the lens’s perspective on the image shifts subtly, frame by frame or field by field, as the camera rolls. Your brain then interprets those multiple perspectives over time as depth information.

If you watched coverage of the Final Four basketball tournament on CBS this year, you’ve seen V3 footage in action. Now, with systems available for rental at Clairmont Camera, the company is gearing up a full-court press to try and get V3-capable lenses into broadcast trucks from coast to coast. Company founder Chris Mayhew cautions that your mileage may vary – it seems that not everyone's brain processes depth in the footage, and not all footage responds to the process in the same way. (To this editor's eyes, some of the footage pops into startling 3D relief, while the effect is more elusive in other clips.) Watch the clip, below, to see some samples along with Mayhew's audio commentary. Then read F&V’s Q&A to get more detail on the V3 system.

Downloadable Podcast

Right-click to download podcast.

Click here to Subscribe to the Studio Daily Podcast

FILM & VIDEO: What makes the V3 system work?

CHRIS MAYHEW: We’re taking parallax, which is the three-dimensional component of your visual system, and incorporating it over time into standard moving-image streams in a way that will trigger the assumption of dimension in your brain. Stereoscopic imagery can be really fabulous, but it doesn’t look like reality. A lot of the time, the people in stereo scenes look like cardboard cutouts. That’s because you’re giving the brain a certain amount of information without giving it other information – the vertical parallax component – that it would normally use to construct a perception.

We provide what’s called a MOE lens, for moving optical element. The lens has an iris inside it that moves in a simple circle. You can adjust the amplitude – how far off the center it moves. And it moves at a particular frequency, which happens to be 4.3 cycles per second. As it moves around, it sees different points of view relative to the plane of focus, and it’s capturing those in every field. You get slight shifts between foreground and background objects pivoting on the plane of focus. Your brain makes an assumption that those shifts are depth and translates it that way.

We look a lot at what nature does. You have bugs and birds that don’t have mental capacity for heavy visual processing, and yet they can do very complex things. They have specialized irises and specialized movement to capture information that is, in a sense, pre-processed before their brain even has to think about what they’re seeing. A pigeon has eyes on the side of its head; it doesn’t have binocular vision like we do. So it doesn’t see three-dimensionally, yet it can fly through the air and land on a telephone line. If you see them when they walk, they bob their heads. They’re basically rangefinding. They’re taking different points of view over time. They can sense shifts where things are close and things are far away, and detect threats, without having to recognize what they’re looking at. They can tell if something’s close and react accordingly. It turns out that people have the same capacity. That’s what we’re exploiting.

People who work with stereoscopic capture can manipulate things like the distance between “eyes” to tweak the 3D qualities of their image, but it seems like this system doesn’t have so many variables.

It does, in the sense that you can change the frequency and the amplitude. If you’re going to shoot at the same speed you display at, you leave the frequency at 4.3 cycles. For the Final Four, where they were capturing at 90 frames a second to play back at one third speed, you had to scan three times faster, or 12.9 Hz. When you cut back to one-third, you’re playing at 4.3. It’s the playback speed that’s critical, not the capture speed.

The amplitude control is like the interocular distance in stereo. With our system, everything is on the fly. You can turn this off while you’re shooting, or you can just put a little bit on, or you can turn it way up. It’s not an either/or thing – we’re just giving you another tool to put in your back of tricks. Creative people will learn how to use this to their advantage and do creative things with it.

What exactly are you adjusting when you change the amplitude?

The iris can actually be positioned anywhere inside the effective aperture of the lens. Let’s say you have an f/1.4 lens, and you set your shooting stop at f16. Now you’ve got a small hole inside a much bigger pipe. You can move that hole – you can step it 2mm to the right and then move it in a 4mm-diameter circle around the center of the lens. So you’re getting a 4mm disparity at the point of focus. That’s how you get the shift.

And that’s roughly equivalent to interocular distance?

Yes, very roughly, but we go to great lengths not to claim that this is three-dimensional. This certainly is not stereoscopic. It’s just regular television or regular movies, but dimensionally enhanced.

It reminds me of animated GIFs I’ve seen on the Internet that gain the illusion of depth by switching rapidly back and forth between two perspectives on one image.

That’s a technique called square-wave switching. It uses alternating pairs. That’s where I started out, and we still do that in some cases for Internet images. When you need to have something that loads fast and you want to reduce it to the smallest number of frames, you can revert to techniques like that. That phenomenon has been known for a very long time, but has never been perfected because people lack the ability to control the alignment of it. We make an image-analysis tool for the government that exploits the same technique, but it has the ability to do sub-pixel alignment so you can critically align images and then see stuff in dimension. Again, they’re exploiting this parallax-over-time concept.

How does the V3 system work with existing lenses?

We made a line of 35mm PL-mount lenses – they are available at Clairmont Camera in North Hollywood – for feature films and TV advertising in 35mm or high-end digital cameras. Then we made the AX2, which goes into HD zoom lenses. It works with the Angenieux 19x and 26x lens and the Fujinon 18x and 22x lens. You buy what Angenieux sells as a “v3-friendly” lens. The 2x extender has been removed and another piece has been substituted in that position. You can stick our AX2 into that and that lens becomes V3-capable.

It works the same with Fujinon lenses. That’s what CBS used on the Final Four. The standard CBS camera guys went out and shot as they normally do. They didn’t have to do anything ‘ just zoom and focus. We were on the Sony super slo-mo cameras, the HDC-3300s. They capture three 30-frame feeds that are 120-degrees out of phase from one another. The camera puts out a 30-frame normal signal, and then all three feeds can be fed into an EVS system that plays back at 90 frames. For the first three games we were set for V3 in slo-mo, and all the slow-motion playback from those two positions, under each net, was enhanced with V3. But the normal feed that went out, scanning at 12.9 Hz, was just good and sharp. On the last game, the championship game, we turned the V3 scan to correspond to the 30-frame feed. Any shots showing the guys shooting fouls from under the net were visually enhanced. It went out all around the world, and nobody got headaches or anything like that. We just wanted to make sure it all worked. And CBS came back and said they wanted to do a bunch more stuff with us.

How much does this add to the cost of shooting?

It might add as much as 20 or 25 percent to the rental. But it’s hard to say how companies are going to package and price it.

Are there types of footage this works particularly well for? It seems like some of your most striking examples involve a moving camera.

For most of our footage, the camera is moving all the time. You can turn the scan up much higher. If you have a locked-down camera and you start to increase the amplitude, at some point you’ll see [the changing parallax]. But if you move the platform, you won’t see it, because of another psycho-physical phenomenon that has to do with stabilization of the environment. And if you look at television, everything is always moving. And a move in this case can be very slight ‘ a zoom, or just handheld. For the Final Four, the guys were sitting in chairs holding cameras, and their subtle body movements were enough to mask the scan. But motion is also a very strong depth cue, and we want to take advantage of that. You take advantage of all of the monocular cues you would get and then put parallax on top of that.

Generally this works when you have a group of objects that are close to the camera that you can compare with something that’s far away. It wouldn’t work well if you went to the beach and pointed the camera at the sunset. There’s nothing there to see. But if you put somebody in a char with an umbrella close up, and then compared them to the sunset, it would look dimensional. You can track straight into them or you can track sideways.

How do people react to the footage?

Everybody says it looks sharper. It’s really not sharper. It’s just perceived sharper because you have more information. And this works well in a mobile environment. When I was out at CineGear, I demonstrated this to a lot of people on my video iPod. It works really well. Some people think it looks better on a small screen than a big screen.

There is a significant percentage of the population that we refer to as jump-up-and-downers. This triggers something in their visual system, and they really see it. They get very excited about it, because it really works for them. For another 50 percent of the population, there’s a learning curve. The more they watch, and the more they learn to use the information we give them, the better it gets. About 20 percent of the population will say it looks sharper, but they don’t necessarily perceive added dimension. I can’t explain that, but it tracks statistically if you look at stereopsis. People who can’t see stereo are somewhere between 20 and 10 percent of the general population.

You mentioned it works on mobile devices, and CBS is using it for sports coverage. What other applications do you have in mind?

If you’re going to do an advertisement, where are you spending most of your money? You’re spending it on buying airtime. Not so much on the production. And you’re putting it in an environment where the people buying in front of you and behind you are buying the same eyeballs at the same time, and probably spending just as much on production as you are. If you shoot something in V3, it will look distinctly different from what’s on either side of it. People won’t know why. It will just look different. That’s where the value is.

We can also do this synthetically. We can do this with computer animation. We have a plug-in for Lightwave that manipulates the virtual camera the same way we do a real camera. We also have just now done code for game-rendering engines. That’s probably our most sophisticated product right now. It’s totally seamless to the user. You run around in the game, and whatever you jump or shoot or blow up, it calculates the best angle of parallax at that moment in time. It’s implemented in the rendering engine, and it doesn’t slow down the game at all. We’re going to release a game on the Internet later this year for free.

How long have you been doing this?

We were able to do 35mm motion picture film in 1995. We shot a sequence in the film Selena with [cinematographer] Ed Lachman down in San Antonio. A sequence at the end of the movie, where she’s singing “Dreaming of You,” was shot with three prototype MOE lenses. They were the first ones. They weren’t PL-mount lenses, so it wasn’t very practical for production. The director, Gregory Nava, wanted it to be hyperreal compared to the rest of the film. It went over really well. That was done with an iris that had four blades. It was an analog system. It took us nine years to get the DPS – the digital parallax scanner, which is the basis of all the systems we have now.

For more information: