The issues that I raised in my two previous posts about software — both posts essentially about different views of the “openness” of software — came into sharp focus last month during the shoot for Housebound.
We’ve always had an improvisatory approach to workflow — partly out of necessity (there’s seldom any time to properly “rehearse” projects) and partly through choice (pieces are often only found “in the making of them” rather than by careful planning, and conversely rehearsals tend to become pieces). Depth is a series of works that marks a departure for OpenEnded — they are all films of sorts, the first, Housebound, even has a screenplay. That said, they each aim to invent a new way of capturing the world photographically, the series concerns inventing cameras, which, in turn, necessitates inventing workflows.
As such,Housebound was recorded in stereo using this elaborate contraption:
A moment of sheer gear-headedness: the mount is borrowed from Alain Derobe, a 3d-film specialist in Paris, and uses a half silvered mirror to allow you to effectively get the cameras closer together than their bodies would allow (we continue to be confused by images found online of rigs for sale that have with very wide baselines — are these for landscapes?). The cameras are a pair of Sony XDCAM-EX1s that send HD-SDI out to a pair of BlackMagic Decklink cards in a pair of Mac Pros each with their own three disk Raid 0 array to record the whole affair direct to glorious uncompressed 1080p30, 10-bit 4:2:2. And I do mean glorious, these cameras with this recording technique has an “authority” that to date I’ve only associated with film.
Regardless, the piece calls for the careful, 3d insertion of text fragments into the footage that seem to hover in space, but in real space (that is, they obey the physics of our hand-held camera material).
Attaching text to an object in the video footage is a relatively straightforward point tracking problem — you have the typical redundant non-choice of tools: Motion, After Effects, Nuke, didn’t Maya have a point-tracker somewhere? I had vaguely assumed (that is: was planning to improvise around) that we could make the tracking super-easy by introducing markers into the scene and then removing them in post. I had assumed that this would be straightforward (despite the absence of a background plate given our moving camera) because after all, we know where the marker is in the scene (we’ve tracked it). How I was planning on removing the marker from the scene turned out to be a little less well planned.
Adobe Photoshop’s “Healing Brush”, if you haven’t tried it, is a genuine marvel. While almost every other thing that Photoshop does is pretty much obvious to anybody well versed in 2d image manipulation (quick: try to guess the number of unique gaussian blur implementations that Apple alone ship with their OS) this brush that magically removes blemishes while creating exactly the right sense of texture truly does seem to do something magical. And it wasn’t obvious to my eyes or brain just how it was doing it (at least it wasn’t before Siggraph 2003). It’s more than up to the task of removing our tracking markers from a video scene.
Knowing that Adobe had so solidly solved the marker removal problem I synthesized, in the back of my preproduction mind, a completely incorrect assumption: that the most recent version of After Effects would also have this wonderful feature.
It doesn’t. Incredible! Adobe ought to own the wire, scratch, marker removal and intelligent matting problem with this technology, and would do had the code that’s in Photoshop travelled down the hall to After Effects.
This omission is understandable within and only within the “large monolithic product” framework, where each product is a separate sovereign, competing state. But Adobe itself has been amongst the most aggressive in pushing a slightly more enlightened viewpoint: product as plugin-host. After Effects already has the UI for offering the healing brush (it possesses a conventional clone tool), but they remain pathetically trapped in a software framework of their own making. A prediction: just as Yahoo plan to “[turn themselves inside out][5]” into a “re-mixable” set of services, Adobe will ultimately be forced to do the same.
But for now neither their After Effects team, nor I, get their wonderful engineering in the form that we actually want it: an API that you can call from code, any old code, your code, anywhere.
What next? What any post-academic would do: read some papers, attempt to find out how they did it, implement it yourself.
We’re in a surprisingly good spot here: Adobe seems to have been permitting their senior hard-math figure [Todor Georgiev][6] to publish actual academic papers rather than just patents. Even better (and perhaps precipitating Georgiev’s release) Andrew Blake and friends published it first (Blake is a member of a small circle of computer graphics people whose papers are, strictly, always worth a read).
Still, both papers suffer from a two habits shared by academia and industry alike: Firstly they are a tease: they stop short of telling precisely you how to do it and switch to selling you on just how good their approach is. Georgiev prefaces his paper with an inspiring, and utterly un-SIGGRAPH theory of digital image representation that draws upon math that I recognize only because I once studied General Relativity; Pérez, Gangnet and Blake, in [Poisson Image Editing][7] (now there’s a Google-proof title), take you almost all the way there and leave you with the following sentence “results in this paper have been computed using either Gauss-Seidel iteration with successive over-relaxation or V-cycle multigrid”. What they actually mean is that they solved their equation 7 by doing pretty much the first thing that pops into your head if you’ve understood the problem this far. Why not say this? Perhaps it leaves room for the inevitable follow up paper “GPU accelerated Poisson Image Editing” where the intern / talented-undergrad reveals that Gauss-Seidel / SOR for this particular problem is essentially equivalent to the much easier to accelerate and understand “blurring the image a lot” algorithm.
But the real crime is this: neither paper comes with source code. There’s no good reason for this, and plenty of bad reasons. Space isn’t a good reason — who reads proceedings in paper form? Relevance isn’t one either — this is a self contained problem with well-defined inputs and no architectural issues. In the last academic field I sincerely was a part of — large scale, messy, integrative AI — there really was an argument for not distracting people by sharing your code with them. What you want to convey with a “systems paper” are design decisions, tradeoffs and perhaps a certain philosophy of approach. None of these things are well captured by the language of code, and nobody takes seriously the scientific criterion of reproducibility when it comes to writings about software design. But in this case here, what the authors are at least pretending that they want to convey is exactly what they did in order to get the results they show. In this case, no special infrastructure is required: their algorithms take two images (original image and mask) and produce a third — you can use your favorite c++ image class for all I care — an image is an image. Their code is the ground truth for exactly what they did, any other level of description, for such small, self contained problems, is icing on the cake.
To not share the very code that they have written (and clearly extensively polished) to make their very accomplished demos would seem extremely odd if it weren’t so common-place. Ultimately it’s nothing less than two-faced: it claims to be committed to the exchange of ideas but not too committed. It’s hard to know how we got here — is it the hope of commercialization? fear of academic competition? laziness?
I am bound by no such desires or fears. And to provide a satisfying punch-line to this argument I’m placing a [Field][8] based open-source implementation of something very Healing-Brush-ish available for download by anybody (anybody, that is, that’s willing to accept the terms of the license) from the [Field development site][9]. It’s probably less than a SIGGRAPH paper column-long. Heck, it’s even GPU accelerated. Here’s it working:
[Take it][10], go do something interesting with it.
My frustration with academic pseudo-publishing continued for the rest of the week with another surprise. Due to the half silvered mirror in our stereo-rig, our camera pair doesn’t provide the same image when presented with the same scene. The mirror seems to reflect more green than it transmits, and the 300MB/s second pair of video that we obtain need to be carefully color matched with each other.
Unlike “healing brushes” there’s hardly any interesting maths to color matching — just user interface. For this problem, we find ourselves again a little far from the mainstream. But it’s just as much fun to build your own interactive color matching tool in Field:
That wasn’t a problem. But had I thought about this before hand I would surely have realized that this was going to cause problems with my dense optical flow code — algorithms that try to tell you what each bit of one image corresponds to in another. Generally such code depends on a pixel in one image being the same-ish color as a pixel in the other frame, in order to find the correspondences. Not to worry, I would have thought, had I been thinking about it at all: optical flow has been a topic of research in computer vision since David Marr was inventing the field itself in the 70s — there’s plenty of research that I can draw upon.
Well yes and no. This too is a problem with low infrastructural complexity — take two images of a scene separated slightly by space or time and produce another — an image of “arrows” that point from a pixel in one image to a pixel in the other. But this too is a place where hardly anyone is sharing actual honest code. Worse, since this is a far older problem than image in-painting, it has had time to shed its SIGGRAPH “ain’t my movie cool” sparkle and has now become a serious game played by serious players.
But the emphasis is very much on game. There are standard images for you to fight on, and, unbelievably, [even a league table][11]. Optical flow algorithms are being evaluated with rules that might be slightly less open than the [Netflix prize][12] — freely available training data with hidden test data, online submission, public rankings, anonymous submitters and, in this case, few rules concerning publication. When I checked, topping the charts this conference season are three or four utterly unpublished algorithms by masked authors. There’s a web 2.0 interface so you can browse just how good they are, but you can’t get hold of them yourself. What is the prize I wonder? Tenure? A job at MSR? Weta Digital? Surely not, at this point, a video compression start-up?
Spurious arguments about intellectual property have no place here — you are either committed to sharing ideas or not; you are either working on your dissertation or you are working on your startup. Too many people are essentially getting the intellectual equivalent of patents published by the ACM and IEEE, and these organizations’ willingness to add a quicktime to any paper only encourages this misdirection. Sure, they don’t have the legal weight or the financial future of a patent, but they mark something just as precious for the tenure track faculty or the career researcher: territory.
—5/08