Field offers a few ways of drawing things to the screen (either to it's canvas or to a fullscreen window) — the high level FLine/PLine drawing system and the lower level base graphics system. If you care about frame-rate, you might be wondering which to use and how. Here are the tradeoffs:
FLine
s or geometry containers that don't change from frame to frame — is either relatively inexpensive (FLine
), very inexpensive (FLine
in a Direct Layer), or close to free (for the BaseGraphicsSystem case). More often than not you are short on CPU power not, GPU power.FLine
/ geometry container costs resources on first use — FLine
s that are completely new allocate resources when they are first drawn. It's almost always faster to mutate the contents of an existing FLine
than it is to throw away one and make a new one.FLine
/ geometry container has overhead associated with it — putting all of your drawing into a low number of FLine
s is always better than having large numbers of them.Let's go through the process of optimizing a hypothetical, but still illuminating, animation task — drawing lots of circles to the screen.
FLine
directly into a shaderFirst, let's just draw something. In a new box, we'll need some boilerplate:
canvas = makeFullscreenCanvas()
from field.graphics.windowing import FrameRateThread
fps = FrameRateThread(canvas)
FrameRateThread
is a secret utility that will print a Field canvas's FPS to standard output every second or so. We'll use a combination of that and top -u
to monitor how fast we are drawing, and how much CPU it's taking us to do it. Presumably if you are using Field for something interesting, you want the CPU to be doing that interesting thing; not shuffling geometry to the screen.
With the above in place, we can just draw a triangle to the screen:
output = canvas.getOnCanvasLines()
lines = output.submit
lines.clear()
# one line
f = FLine().moveTo(0,0,0)
f.lineTo(10,10,0)
f.lineTo(0,10,0)
f(filled=1, color=Color4(1,1,1,1))
lines.add(f)
That gets us a triangle on the canvas (you might need to back out a bit with the camera, use shift-down):
Obviously our frame rate doesn't change when we draw a single triangle — you'll find the FPS either 60 or 100 depending on your OS and screen refresh rate. top -u
puts Field's CPU usage at <10% of a core.
Now let's stress things out a bit:
output = canvas.getOnCanvasLines()
lines = output.submit
lines.clear()
for x in floatRange(-500, 500, 50):
for y in floatRange(-500, 500, 50):
f = FLine()
f.circle(15.5, x+Math.random()*50, y)(filled=1, color=Color4(1,1,1,0.5))
lines.add(f)
That draws 2500 filled and stroked circles in a square, perturbed grid:
At around this point — on a recent MacBook Pro — our framerate starts to dip below that of the screen; CPU usage is around 80% of a core. And we're not even animating anything!
FLine
to "Direct Layer"We can slash our CPU usage by using Direct Layers. These are layers optimized for drawing directly to canvases (unlike the full FLine drawing system which is flexible enough to host web-pages and make PDFs).
# of course, if you already have a shader that you
# want to use, you can use it here
shader = makeShaderForElement(_self)
# disable depth test to remove z-fighting
shader << DisableDepthTest()
output = shader.getOnCanvasLines(canvas)
lines = output.getDirectLayer("myLayer")
lines.submit.clear()
for x in floatRange(-500, 500, 50):
for y in floatRange(-500, 500, 50):
f = FLine()
f.circle(15.5, x+Math.random()*50, y)(filled=1, color=Color4(1,1,1,0.5))
lines.submit.add(f)
That code takes our frame-rate back up to being "refresh limited" (e.g. as fast as your screen / OS will permit) and our CPU usage down to around 10% again.
If we push that up to 100x100 rather than 50x50 — some 100,000 circles — we get something like this:
At this point, especially on laptop graphics hardware, you might find your display "fill rate limited" — that is, your frame rate is limited by the number of (overlapping) pixels drawn to. You can see that as you move in and out of the scene your frame-rate waxes and wanes. Far away everything is fast (because your scene covers only some fraction of the screen); close up everything is fast (because much of your scene is offscreen), but in the middle everything slows down quite a bit. There is basically nothing Field or any other tool can do about this — at this point, it's between you and your graphics card.
But at 100,000 circles you'll also notice that Field's CPU usage is creeping up again. This is now caused by the overhead of all of those FLine
}s. If you can, you might want to consider putting everything inside a smaller number of FLine
instances. For example:
lines.submit.clear()
f = FLine()
for x in floatRange(-500, 500, 100):
for y in floatRange(-500, 500, 100):
f.circle(20.5, x+Math.random()*50, y)(filled=1)
lines.submit.add(f)
Here you'll notice that, for overlapping filled pieces of geometry, the results are not exactly the same:
A close-up reveals why:
Field's tesselator (the thing that fills lines in) has carefully applied the same algorithm to our cloud of circles as it would use to carve the holes out of letterforms. Obviously there are other combinations of FLine
properties — colors and attachments to shaders — that might preclude you putting all of your geometry into a single massive FLine
. To fix this see below (.starConvex
.
Like most high quality graphics rending platforms Field carefully subdivides cubic spline segments before it draws them. It only stops this recursive subdivision when the segments it's producing appear flat or are too small. But for live animation, this strategy is too CPU intensive, and in continually varying the amount of geometry produced it thwarts attempts to cache it. Direct layers do something much dumber — subdivide every .cubicTo
into a constant number of segments. You control this via the layer returned from .getDirectLayer(...)
:
# of course, if you already have a shader that you
# want to use, you can use it here
shader = makeShaderForElement(_self)
# disable depth test to remove z-fighting
shader << DisableDepthTest()
output = shader.getOnCanvasLines(canvas)
lines = output.getDirectLayer("myLayer")
lines.fixedCurveSampling=1
lines.submit.clear()
f = FLine()
for x in floatRange(-500, 500, 50):
for y in floatRange(-500, 500, 50):
f.circle(20.5, x+Math.random()*50, y)(filled=0)
lines.submit.add(f)
lines.fixedCurveSampling=1
yields only one line per cubic segment:
Whereas, lines.fixedCurveSampling=10
gives:
In situations where you are animating, or CPU limited, you might want to carefully tune this property.
.starConvex=1
The examples above are all focused on the static case — run some code that produces some geometry; send the geometry to the graphics system for rendering; explore it with a camera. What about geometry that changes per frame? Similar concerns apply: there's per-vertex; per-pixel and per-FLine overhead. Additionally, there's "per-line of Jython" overhead as well.
Here's a test:
output = shader.getOnCanvasLines(canvas)
lines = output.getDirectLayer("myLayer")
def someAnimation():
lines.submit.clear()
for x in floatRange(-500, 500, 20):
for y in floatRange(-500, 500, 20):
f = FLine()
f.circle(20.5, x+Math.random()*50, y)(filled=1)
lines.submit.add(f)
_r = someAnimation
Yields four hundred randomly moving circles at somewhere around 15fps @ 100% CPU usage. Not good at all.
Can we do better? Yes. As usual, we have to jettison some of the flexibility of the FLine
drawing system. First let's tweak the tesselator. Direct layers offer a property .starConvex
that tells Field to use a far simpler, and less general tessellation algorithm.
def someAnimation():
lines.submit.clear()
for x in floatRange(-500, 500, 20):
for y in floatRange(-500, 500, 20):
f = FLine()
f.circle(20.5, x+Math.random()*50, y)(filled=1, starConvex=1)
lines.submit.add(f)
That grabs us another few frames-per-second.
Combining all these optimizations together, and dropping the stroke
of the circle gives us 400 circles, animating arbitrarily at 30fps @ 50% CPU:
from NewCachedLines import FLine, CFrame
output = shader.getOnCanvasLines(canvas)
lines = output.getDirectLayer("myLayer")
lines.submit.clear()
lines.fixedCurveSampling=5
f = FLine()
lines.submit.add(f)
def someAnimation():
f.clear()
for x in floatRange(-500, 500, 20):
for y in floatRange(-500, 500, 20):
f.circle(20.5, x+Math.random()*5, y)(filled=1, stroked=0, starConvex=1)
f.forceNew=1
_r = someAnimation
At this point, most of that CPU usage is split evenly between the FLine
drawing system and overhead associated with Jython.
Can we do better than this? To go any further in our drawing circles task, we need to jettison the FLine
system altogether. There's too much there that's either expecting you to draw arbitrary shapes or expecting you to reuse geometry between frames for this to be the fastest way of drawing circles. Just how fast can we make drawing circles anyway?
Here's how to do this in a completely optimized way. In a new box:
thickShader = makeShaderFromElement(_self)
canvas << thickShader
quads = pointContainerWithQuads()
thickShader << quads
thickShader << DisableDepthTest()
with quads:
for x in floatRange(-500, 500, 1000):
for y in floatRange(-500, 500, 1000):
v = quads.nextVertex(x,y,0)
quads.setAux(v, 5, 2)
This sends one million points to a point container associated with a shader (thickShader
). pointContainerWithQuads
draws a quad for each call to .nextVertex()
— it's basically exactly what you need if you want to draw a bunch of "point sprites". Of course, without any intelligence in the shader, you'll see absolutely nothing. Let's turn these point sprites into circles.
First the vertex shader:
varying vec4 tex;
attribute vec4 s_Color;
attribute vec4 s_Texture;
attribute vec4 s_Five;
void main()
{
gl_Position = gl_ModelViewProjectionMatrix * (gl_Vertex+vec4(s_Texture.x-0.5, s_Texture.y-0.5, 0.0, 0.0)*2.0*s_Five.x);
tex.xy = s_Texture.xy;
}
pointContainerWithQuads
uses s_Texture.xy
to label the corners of each quad that comes out of .nextVertex
. s_Texture.xy
goes from 0,0 to 1,1 across the corners. We use this and s_Five.x
to make these points thick. And we pass s_Texture.xy
down into the fragment shader:
varying vec4 tex;
void main()
{
float r = length(tex.xy-vec2(0.5, 0.5))/0.5;
if (r<0.8) discard;
r = smoothstep(1.0, 0.9, r)*smoothstep(0.8, 0.9, r);
gl_FragColor = vec4(r,r,r,r);
}
This little piece of near assembly code is taking r
— the distance from the center of the quad and using it to create a smooth annulus, by multiplying together two smoothstep functions. Obviously the code for filled circles is even more straightforward.
Together these code fragments give you a sea of "points" drawn as circles:
Even with 100,000 points a (100x1000) grid, this scene is refresh limited (~60fps) at <4% CPU. Field is issuing a handful of OpenGL calls each frame to draw everything. You can use this approach to draw a million point sprites, but you have to be careful not to reach the fill limits of your hardware — i.e. keep the circles small.
Dropping down a few levels of abstraction has also sped up animation:
def animate():
with quads:
for x in floatRange(-500, 500, 100):
for y in floatRange(-500, 500, 100):
v = quads.nextVertex(x+Math.random()*50,y,0)
quads.setAux(v, 5, 30)
_r = animate
This speedup is a combination of reduced bandwidth to the graphics card (we're sending four vertices per circle rather than the actual circle geometry), reduced overhead from Field and reduced overhead from Jython (since the above code is very straightforward). And this renders, 10,000 circles on a laptop, at around 25fps @ 90% CPU usage. Our previous 400 circle case is too fast to measure.
Adding better (screen-space) antialiasing to these circles and color, we can animate scenes like this:
The code for this, can be downloaded here.