Field's OpenCL Integration

Field contains an experimental set of bindings to OpenCL — an "industry standard" language and runtime for parallel processing that's available under Snow Leopard. This is interesting for two reasons — one, it provides a way of using computational resources — both GPU and CPU — that are very hard to write code for from any platform — Java or otherwise; two OpenCL is built with an eye to integrating it with Open_GL_ and Field already has an OpenGL-based graphics system. So Field provides a light Python / Java wrapper for you to use "interactively" in coding against this library.

Two things to keep in mind: while programming OpenCL from Field is fun, and the speedups can be tremendous, it's still a hard language and programming model to get into. The best resource is the Apple example code — in fact we'll be borrowing one of their examples here. Furthermore it's not just hard, it's unforgiving — the drivers are really fresh. We've had spectacular video subsystem crashes like we haven't seen since the days of the Amiga in response to off-by-one errors.

The .field sheet that contains the code for this page can be downloaded here. If you are not using the very latest-and-greates from Apple, you might have some trouble on some graphics cards or multi-graphics card setups.

A simple example

We're going to port the Apple example "OpenCL Ray Traced Quaternion Julia-Set", which is available here. There are a bunch of references on that page not just to Apple example C code, but also to pages that will tell you what a Quaternion Julia-Set is and why you might want to ray trace one. In short, it's fractal eye candy throwback to the early 90s, but computed at thrilling speeds.

The example, like many OpenCL workloads, in structured in the following way. There's a program, written in the OpenCL C-like language that's going to run on the GPU. This program takes a number of inputs of various types — in our case a couple of Vector4 parameters, a single scalar value and a chunk of memory (to be understood as a 2d buffer) where the computed result should got. You'll find other OpenCL programs will also consume chunks of memory for input.

In order to "see" the output chunk of memory we need to get it out of OpenCL and into an OpenGL texture — an operation that should be, in this case, either free or very fast. After all, both the OpenCL stuff and the OpenGL stuff are executing on the graphics card.

Part 1 — OpenGL

Let's see how this gets put together in Field. First the boring, non-CL part (which we'll do "long-form"):

In a new box in Field, let's put:

canvas = getFullscreenCanvas()
shader = makeShaderFromElement(_self)

plane = meshContainer()
canvas << shader << plane

with plane:
    plane ** [Vector3(0,0,0), Vector3(1,0,0), Vector3(1,1,0), Vector3(0,1,0)]
    plane ** [0,1,2] 
    plane ** [0,2,3] 

plane.aux(5, 2).put([0,0,  1,0, 1,1, 0,1])

As you'll recall from the docs this code will make a full screen canvas, make a GLSLang shader associated with the current element, make a mesh that we can add geometry to called 'plane' and connect these things together — the shader is attached to the canvas and the plane is attached to the shader.

The rest of the code makes a plane 1 unit large with texture coordinates, see here for a refresher.

For the GLSLang shader we'll write:

attribute vec2 s_Five;
varying vec2 textureCoordinates;

void main()
{
    gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
    textureCoordinates = s_Five;
}

for the vertex shader and:

uniform sampler2D theTexture;
varying vec2 textureCoordinates;

void main()
{
    gl_FragColor = texture2D(theTexture, textureCoordinates.xy);
    gl_FragColor.r += 0.1;
    gl_FragColor.a = 1.0;
}

for the fragment shader. Nothing interesting here, just straightforward 2d texturing with just a little extra red.

Because once you've executed that all you'll see is that extra red — a very dark red plane, or, possibly uninitialized texture "noise" because you are missing a texture. It's the texture that we are going to connect to OpenCL.

Part 2 — OpenCL

Let's do some new stuff in a new box.

from field.graphics.opencl import OpenCLSystem
from field.graphics.opencl.OpenCLSystem import *

cl = OpenCLSystem(canvas.getPostQueue())
width,height = 512,512

this creates a new cl object associated with the canvas, and sets up two variables width and height.

Next we need to compile our OpenCL code. I've found the best thing to do is to make a new box and type:

source = """

// here is my really long OpenCL code

"""

in it. I can execute it one to define the string "source". In our case we are going to rip the source right out of Apple's example code here called "qjulia_kernel.cl".

If you take a close look near the end of the file, you'll find the main entry point — it's marked as a __kernel. Specifically, it says:

__kernel void
QJuliaKernel(
    __global uchar4 *result,
    const float4 mu,
    const float4 diffuse,
    const float epsilon)
{
...
}

To compile our OpenCL string "source" we write:

kernel= cl.makeKernel( source, "QJuliaKernel")

... including the name of the _kernel entry point.

Now we have four parameters to set up — the uchar4 *result, the two float4 parameters and the scalar.

result = cl.makeMemory(4*width*height)
kernel[0] = result
kernel[1] = Vector4(-0.2, -0.4, 0.3,Math.random())
kernel[2] = Vector4(0.5,0.7,1,1)
kernel[3] = 0.0003

This code sets the four parameters to the kernel. The first parameter needs to be created — it's essentially a very fancy malloc. For the next two we can use Field's Vector4 class.

Now we have our compiled code, an entry function and parameters for the entry function — all that remains is to execute the code:

kernel.execute2D(width,height,-1,-1)

(Those -1's refer to the size of the work units, the geometry of how the parallel ray tracing jobs get split up over the dazzling number of processors inside your GPU, '-1' tells OpenCL / Apple to figure it all out itself).

Part 3 — seeing the result

You might have just ray traced a quaternion Julia set, but you still can't see it. Time to tie the result buffer to a texture.

glTexture = cl.addTexture(0, plane, width, height)

That call adds a texture to plane of the right width and height connected to texture unit 0 (the first one).

Finally, the code that actually "transfers" data from result to the texture.

glTexture.copyBufferToTexture(result);

With this call you'll see your Julia set on the screen:

For extra fun we create a new element in Field and write:

def update():
    kernel[1] = Vector4(-0.2, -0.4, 0.3,Math.random())
    kernel.execute2D(width,height,-1,-1)
    glTexture.copyBufferToTexture(result);

_r = update

This randomizes one of the parameters, executes the kernel and copies out the result. Then, when we option-click on this box, we get an animated ray-traced quaternion julia set, moving faster than the eye can see. If you weren't there, you've no idea how cool this would have been back in '92.

Where to go from here

The first thing you might do is prod us for more documentation — it's a pretty small API and we think we've got it covered under the hood. The drivers are rough at this stage, but we're pretty excited about OpenCL — we're going to be using it in production pretty soon, so we'd appreciate some extra eyeballs.