All the 3D footage in the video was captured with a Kinect camera and saved with custom software written in openFrameworks. The depth information is saved as a color-coded video. To decode it in the browser, I’ve written a shader for WebGL that transforms the color-coded video into a moving point cloud. The online experience also makes heavy use of ThreeJS, a great 3d engine for WebGL.
There are two songs to play with. In “TNT for Two,” you control the camera as the video cuts between shots of the band performing and a dreamy narrative journey from the sea to forest.
In “These Are Conditions,” things are a bit more unconventional. You, the viewer, have complete control of the scene. You can grab the drummer’s head and swap it for the bass player’s as they’re both still playing and singing. You can duplicate the keyboard players limbs a dozen times making a sea of arms writhing to the beat or create a penguin and re-scale it so it’s 100 metres tall.
I’m hoping to clean up the Kinect recording software and release it for others to use. I’m looking into James George’s RGBDToolkit at the moment, and am hoping I can add to the great work his team have already done.
What I liked about the approach we used on this project is that we always worked with straight video, never OBJ files or PCD files or any other strange format. The depth is saved as video, and the texture data is saved as video. This meant that we could edit it in Final Cut and do clean-up in After Effects.
Neither of us are very experienced with 3D editing software like Maya or Max and luckily we never touched any during the entire project. I think many filmmakers are in the same boat, so if we can come up with a workflow that fits in with the tools they already know, then we’ll start to see some really interesting work come out of this technology. I feel like we’re just scratching the surface of something big here. Depth is a new dimension to filmmaking… at the moment it feels like a gimmick at times, but so did sound and color when they were first introduced to cinema.
I’ve got a few more ideas on how to improve the interaction for “These are Conditions.” I think the most helpful one would be a simple way to save your creations. I haven’t built it yet because I’m not sure if anyone would use it. But if you think it’s a good idea, let me know and I’ll get to work!
In this post I’ll explain how the visuals came together for the V Motion Project. For an overview of the Kinect controlled instrument and how it works, see Part I – The Instrument. You can watch the music video here.
The climax of the project was a performance in downtown Auckland, with speaker stacks blaring and a set of giant projectors shooting visuals at a 30 metre by 12 metre wall in front of the motion artist. Three different film crews were there shooting the event for a TV commercial, a music video, and a live TV feed.
The projected visuals needed to be both spectacular and informative. It was important that anyone watching the show could instantly see that Josh was creating the music with his movements, not just dancing along to the song. This meant we needed to visually explain what he was doing at all times, how the song was being built up bit by bit, and how each motion was affecting the music. We also need to put on an exciting show with visual fireworks to match the arc of the song.
A shot from the performance
Matt’s early concept sketch
Design and motion graphic maestros Matt von Trott and Jonny Kofoed of Assembly led the design effort. This early concept sketch shows the main elements of the visuals. The motion artist’s digital avatar, a giant ‘Green Man’, is center stage surrounded by a circular interface. Around him a landscape grows and swells as the track progresses. The motion artist stands directly in front of the wall, so when viewed from behind his silhouette is sharply defined against the glowing backdrop.
I wrote the software to run the visuals in C++ with OpenFrameworks. For the performance, it was running on a Mac Pro with 16 GB of RAM and a NVDIA video card with 1 GB VRAM and three outputs. The main visual output for the wall was 1920 x 768, secondary output for the other building was 432 x 768, and a third video-out displayed a UI so I could monitor the machine and tweak calibrations. The A/V techs at Spyglass furnished the projectors, generators, and speakers for the performance. The main visuals ran to their Vista Spyder which split the display across two 20K projectors and took care of blending the overlap.
Feeding the Machine
As I described in my previous post, the instrument runs off two separate computers; one in charge of the audio side of things, the other the visual. The first challenge was to get as much realtime data as we could out of the audio machine into the visuals machine, so we could show what was happening with every aspect of the instrument. We did this by sending JSON data over UDP on every frame. The audio system tells the video system where the player’s skeleton joints are in space, which set of audio controls they are manipulating and the current state of these controls. The visuals system also takes an audio line-out from the external sound card and processes the sound spectrum. The visuals system is in charge of the “air keyboard”, so it already knows the state of the keyboard and which keys are down.
Layers within Layers
To help us understand how the visuals were going to come together, I organized the realtime rendering to use distinct layers stacked up and blended together for the final image, just like a Photoshop or After Effects file. This way we could divvy up the layers to different people and update the system with their latest work as we went along. The illustration below shows the 4 layers of the visuals; the background ‘landscape’, the machine, the Green Man, and the interface elements.
To combined the layers, Matt and Jonny wanted to use the equivalent of Screen blending in Photoshop. This gives a great look to the vector-monitor style of the visuals (like the old Battlezone or Star Wars arcade machines) since the lines get brighter where they overlap. This also turned out to be a great technique to increase the speed of realtime rendering. Screen blending works in a similar way to additive blending, where black has no effect on the output. So this means we didn’t need to use an alpha channel anywhere, we just rendered graphic elements against a black background. By getting rid of the alpha channel, we saved 8bits per pixel which, when you’re rendering 2million+ pixels per frame, can really add up.
Matt and the team spent weeks crafting a series of epic environments and effects that correspond to the build-up of the song. Because we knew the footage of the performance needed to match the radio edit of the song for the music video to work, we were able to lock down the exact timing of the musical transitions and Matt used these to orchestrate the visuals of the landscape. The final render took over 3 hours on Assembly’s massive render wall.
In the visuals software, the landscape layer contains a large video that’s precisely synchronized to the music. A big challenge was starting the video at the exact time that Josh “kicks in” the track. We do this by sending a signal from Paul’s music system to my visuals machine over UDP. The playback needs to be fast and stable, so I used QTKit to play the video. It’s 64-bit, hardware accelerated and runs in a separate thread so it did the job fantastically, running the 1.9GB Motion JPEG Quicktime at a steady 60fps.
The Green Man
For the motion artist’s on-screen representation, we wanted a glitchy/triangulated look that matched the look of the rest of the visuals. Creating the effect turned out to be fairly simple thanks to the silhouette that you can easily get out of the Kinect’s depth information. You simply ignore all the pixels of depth information beyond a distance from the camera (4 meters in our case.) If you don’t have any obstructions around the subject, you can get a very clean silhouette of the person.
Left: Early style frame for Green Man, Right: final Green Man
To create the triangle effect, I just run the silhouette through a few stages of image processing. I use OpenCV to find the contours, and then simplify it by taking every 50th point. I then run this series of points through a Delaunay triangulation. I do one more check against the original depth information to decide what color to fill the triangle: Triangles are lighter the nearer they are to the camera, and also lighter from top to bottom. This makes it feel like the figure is being lit from above, giving it a subtle sense of form.
The User Interface
The user interface of the instrument has a huge job to do. It needs to very clearly show how the instrument works and what Josh is doing to create the music. Jonny and the guys spent a lot of time trying different ideas and looks for the interface. It was important to us that nothing about it was fake, and that every element had a purpose.
The visuals for the interface are created by a combination of pre-rendered transitions, live elements drawn with code, and sprite animations triggered by the motion artist’s actions. For instance, when Josh hits a key on the keyboard, I draw a “note” on the score that’s ticking away above him as well as fire off a sprite animation from the key that he hit. If you watch the video closely, when he’s finished creating a sequence of notes, it gets sucked into a little reel-to-reel recorder in the bottom left which then loops it back.
Each different tool that Josh uses to create the music needs to be visualized in a slightly different way, to help explain what he’s doing.
The keyboard keys change color when hit, and the notes he’s played are drawn in an arc across the interface. The second control is nicknamed “Dough”, because the motion artist uses it to kneed and shape the sound.. the ball (and sound) grows when his hands are wide, and shrinks when they’re close together. The rotation of the ball affects the sound as well. When he’s controlling the LFO (that distinctive dubstep “wobble” effect), we draw yellow arrows moving at the same frequency as the audio oscillation that he pulls up and squashes down.
The sprites that come out of each keyboard key have a distinct look. This video shows how each animation was designed to match the way the sound of the sample.
And here’s a bit of code that I used all over the place for drawing the live gauges, midi tracks, keyboard keys and other elements into the circular interface. Fight the tyranny of right angles!
This technology is amazing and it feels like the tip of the iceberg. I’d love to see v2.0.. multiple musicians, flexible loop editing, realtime VJing. And as new cameras come out with higher frame rates and higher resolution, the responsiveness and power will get better and better. It’s going to be fun!
The Motion Project was a collaboration between a lot of clever creative people working together to create a machine that turns motion into music. The client for the project, Frucor (makers of V energy drink), together with their agency Colenso BBDO, kitted-out a warehouse space for this project to grow in and gathered together a group of talented people from a number of creative fields.
Producer Joel Little (Kids of 88, Goodnight Nurse) created the music, musician/tech wiz James Hayday broke the track down and wrestled it into Ableton, and Paul Sanderson of Fugitive built the tech to control the music with the help of Mike Delucchi. I also helped with the music side of the tech and built the visuals software with motion graphics warlocks Matt von Trott and Jonny Kofoed of Assembly. Assembly also produced and directed the music video. Hip-hop/tap dancer Josh Cesan was the ‘Motion Artist’ tasked with playing the machine and Zoe Macintosh of Thick As Thieves documented the process. Heaps of other people were involved along the way. It was a true collaboration, with everyone contributing ideas and energy to the process.
The final design of the machine was the result of months of experimenting… it all makes perfect sense in hindsight, but in the beginning, none of us were really sure what we were building. The starting point was the idea of using the Kinect camera to detect the movements of the musician. Paul spent months playing with the Kinect and looking at what other people had done utilizing it to create music. He used Ben Kuper’s brilliant BiKinect project as a starting point and heavily modified it into something even more powerful. This software allows a musician to control powerful audio software Ableton Live with only the movements of their body.
Unfortunately, this control and flexibility comes at a price… the system has a significant lag. This lag doesn’t affect many of things you might want to do with the instrument, such as tweak a filter setting, or trigger a loop to start on the next measure. But the lag does make it nearly impossible to use the system as a drum-pad or keyboard. When there’s a delay between hitting a key and hearing a sound, it’s really, really hard to play a melody and even harder to play a drum beat in time.
Cracking the problem of playing notes in ‘real-time’ was one of the first major hurdles we overcame. In my previous work with the Kinect, I hadn’t experience the lag that we were seeing with BiKinect solution. I realized that the lag was happening because the OpenNI drivers it uses do a heap of maths to process image data and calculate the position of each joint of a persons skeleton, 30 times a second. I’d been working with raw depth data straight out of the camera without the skeleton processing, so had no intermediaries to slow things down.
To create a system that could give us the control and flexibility of the skeleton drivers, and the real-time speed of the raw-depth method, we decided to use two Kinects running different software on two different computers. A good idea, but it brought with it a whole new set of problems.
Two Hearts Beat as One
Below is my highly technical drawing of the instrument. The Motion Artist is on the left, and the Kinects are pointed at him from straight ahead. One Kinect is using OpenNI drivers to calculate his skeleton position. The other Kinect uses freenect drivers to access the raw depth data from the infrared sensor. The top computer is Paul’s music system, which is built in Processing on a Windows PC. The bottom computer is the Mac Pro running my visuals system written in C++ with OpenFrameworks. Mac and PC happily working together, hand in hand… could world peace be far behind?
The two computers talk to each other over UDP, sending simple JSON objects back and forth on each frame. The music system sends data about its current state (the user’s skeleton position, what sound the user is manipulating, etc) to the visuals system. The visual system sends midi info back to the music system when the user is pressing an “air keyboard” key, as well as timing info to help the motion artist nail the song transitions [something we programmed in on the morning of the performance ].
The visuals computer outputs to 3 displays. One output went to a normal monitor for my debugging/monitoring UI, another was split with a Vista Spyder across two 20K projectors to cover a 30 x 11.5 metre wall. The final display out went to another 20K projector aimed at a second building.
The music system outputs to an M-Audio external sound card hooked into a large speaker setup. The sound card also has a line-out that plugs into the visuals machine, which does FFT audio analysis for some simple sound visualization.
Don’t Cross the Streams
You’re not supposed to aim Kinects at each other. If you do the proton fields reverse and the universe implodes on itself… or at least that’s what we’d heard. If you give it a try, the world doesn’t end, but the cameras definitely don’t like it. The depth data you get is full of interference and glitchy noise. Luckily, there’s an incredibly simple solution to fix it.
Matt Tizard found a white paper and video that explained an ingenious solution: wiggle the cameras. That’s it! Normally, the Kinect projects a pattern of infrared dots into space. An infrared sensor looks to see how this pattern has been distorted, and thus the shape of any objects in front of it. When you’ve got two cameras, they get confused when they see each other’s dots. If you wiggle one of the cameras, it sees its own dots as normal but the other camera’s dots are blurred streaks it can ignore. Paul built a little battery operated wiggling device from a model car kit, and then our Kinects were the best of friends.
Controlling the Music
The music system works by connecting the Kinect camera to Ableton Live, music sequencing software usually used by Djs and musicians during live performances. Below is a screen capture of our Ableton setup. The interface is full of dials, knobs, switches and buttons. Normally, a musician would use a physical control panel covered with the knobs, dials, and switches to control Ableton’s virtual ones. Paul’s music system works by allowing us to map body movements to Ableton’s controls. For example, when you touch your head with your left hand a certain loop could start. Or you could control the dry/wet filter with the distance between your hands. This ability to map physical motion to actions in Ableton is enormously powerful.
Video: The Ableton Live setup for the Motion Project
Video: The skeleton based audio control system. In this example, the distance between the hands turns one dial in Ableton, and the angle of rotation controls another one.
The Air Keyboard
Getting the “air keyboard” to work along side the gesture based system was a big breakthrough, and gave us the speed we needed to play notes in real-time. The video below shows the evolution of the idea. In my first test, I made the keyboard work as if it were a giant button in front of the player. When you pushed your hand into the air in front of you, they key would trigger. Next I tried breaking the space in front of the player into a series of boxes that you could play quickly, like a xylophone or harp.
This ‘push forward’ technique worked well and was fun to play, but it was hard to play a specific melody or an intricate beat. There was a bit of lag introduced by the image detection I was doing to see the ‘key’ presses, but the main difficulty was the lack of feedback to the player. When you play a real keyboard or drum, your hand strikes the surface and stops. When you play an arbitrary square of air in front of you, it’s hard to intuitively know when you’re close to the target, have hit it or are totally off. We displayed visual cues on screen, but it was always a bit like groping in the dark.
The big break through came when I moved the keys to the side of the player. When you fully extend your arm to play a note, there’s the physical feedback of your joints extending to their limits, so playing the instrument at speed becomes much more natural. Also, to detect a “hit” or “miss” all I need to do in code is look at the silhouette of the player and then check each rectangular key. If the key has more than 20 pixels of silhouette in it, then it’s down. Otherwise, it’s off. It’s super fast and responsive.
The final piece of the puzzle was to define what the keyboards look like and how they behave. We defined the layout of each keyboard in a JSON file that lives on the PC. Paul wrote a little Air app to let us tweak positioning and define which midi notes each key should trigger. This was really helpful as we worked with Josh to create a keyboard to fit the way he wanted to move. So the end result is a giant MIDI “air keyboard” with keys that we can create and position anywhere in space.
Putting it All Together
This video demonstrates the instruments available to the motion artist. “Vox” and “Bass” are keyboards. “LFO” controls the low-frequency oscillation (that distinctive dubstep ‘wobble’). “Dough” uses two filters, one controlled by the distance between his hands and the other by the rotation of the ‘ball of dough’ he’s creating. “Drums Filtered” is a drum keyboard, but the sounds become filtered as his chest gets lower to the ground. There’s one element of the performance that’s not in this test; the moment that Josh looks like he is stretching out a giant triangle. Here, the distance between his two hands and the ground provides control. As he pulls the sound up, it gets louder and as he pulls his hands apart, the dampen filter decreases causing the sound to “open up”.
We were just playing around during this test, so his rhythm is a bit off at times
Another nice thing about dividing the system into two symbiotic parts was that Paul and I were able to split off and work in different directions once we had nailed the structure of the system and the communication method we’d use to pass data back and forth. Paul worked at the warehouse with the musicians and performer to get the flow of the song nailed down and tune the instrument so Josh could actually play the thing (he makes it look easy in the video, but its definitely not). I moved on to the Assembly office to work with Matt and Jonny on adding a bit of sparkle and spectacle to the visual side of things.
[Update: Paul has posted an article detailing the project from his point of view. He has lots of good insight into the music tech and describes how the instrument evolved over the months. Definitely check it out!]