Pushing around +300.000 3D particles, realtime, on screen, using Flash ? No problem, if you are using Adobe Alchemy & PixelBender to compile and run your code!
During my session “professionally pushing pixels” at FITC Amsterdam this year, amongst other things, I talked about how to best utilize parts of the Flash Player to get top performance. This is one of the examples I showed. What you are seeing in this example, is +300.000 particles being 3D transformed, projected and draw to 2D. And it does so at quite a good framerate (well, it’ll depend on your machine too).
So, how do we achieve this ? The answer is a combination of PixelBender and Alchemy.
First, let’s look at the 3D transformation and projection.
Flash 10 has a number of native features to allow for 3D transformation and projection. You’ll find that this is a combination of using the Vector, Vector3D, Matrix3D, PerspectiveProjection, etc. Although these features are great, we can’t use them in combination with Alchemy easily. I’ll explain why later, for now, let’s look at an alternative method to do the projection.
Where oh where in the Flash Player do we have a method of doing very fast math ? The answer is; pixelbender! Although Pixelbender is normally used for image based manipulation, you can make it do any type of number-crunching which is able to be executed in parallel and without loops.
To calculate rotations and projecting our 3D data, we use Pixelbender in “ShaderJob” mode. When using pixelbender in image based mode, it operates in 8 bits per channel. Thankfully, when using it with a ShaderJob, it allows 32 bits precision per channel for the data processing. Since 8 bit precision wouldn’t be enough for this example, we use a shaderjob.
The VertexProjector pixelbender kernel, included with the source is a simple way of transforming and projecting vertices (representing particles, in this case) in 3D space. We feed this kernel a bytearray of x,y,z paired data, and execute the shaderjob. It then returns the data as a bytearray, in px, py, pz format.
Drawing things to screen.
Now we have all the 2D projected 3D data, we need to draw things to screen, and we have to do so as quickly as possible. This step is traditionally called rasterization. In AS3, you’re most likely to use getPixel when drawing on a per pixel basis. Doing so in a loop for 300.000 pixels turns out to be very slow. The solution for this would be to optimize that loop as much as possible. Either by writing your own bytecode, or maybe writing your own post-processor for you code, before you compile. But we don’t have too, since Adobe Alchemy exists.
As you can read in my earlier post about Adobe Alchemy, I openly questioned why it was so speedy, as compared to regularly compiled ActionScript 3 code. Although the answer is rather complex, the combination of C Based code, the LLVM compiler and “Alchemy Virtual Memory” are the base of this. The large difference between Alchemy compiled actionscript and regular compiled Actionscript can be further explained by the regular AS3 compiler not doing any optimisation. This example shows off those performance increases.
One thing to worry about when using Alchemy in your ActionScript projects is marshalling. You can read Branden Hall’s post on Alchemy for more info on that. Since we wouldn’t be able to marshal 300.000 vertices from a Vector.<Number> in AS3 to our alchemy code, we need to find a better solution. This is exactly why we are using Pixelbender and more-over, the bytearray data.
It is possible to manipulate the memory Alchemy uses in the runtime. This memory is represented as an AS3 ByteArray object. If we directly write and get our data from this memory block, no marshalling is needed. Although this means not all things can be done this way, for some things, this can be very useful. For instance, getting large blocks of data, like images and bytearrays of coordinates.
Getting all these 3D particles to screen is simply 1 inner loop. While we would normally call setPixel for that, in Alchemy code, we don’t have that luxury. Instead of that, we write directly to our screenbuffer memory, which is represented as a set of int’s. Here, one more problem comes into play. Endianess, defines the byte ordering for a set of data. Alchemy uses little-endianess for it’s internal memory representation. Specificall, it uses a small class called LEByteArray. This class extends ByteArray and ensures no changes are made to the endianess of the memory. Makes sense, since otherwise your code would blow up.
Writing to the screen is then a piece of cake. We take the alchemy processed data from it’s memory, and write it to a bitmapdata using the formerly much less usable setPixels() command. It’s amazing to see how fast this is.
Look at the example here, and download the full sourcecode here. As you can see from the example, the difference between doing this with regular ActionScript versus Alchemy nears a 5 fold speed increase.
In future I’ll be posting more demos of the technology. Amongst which there will be one appliance for the future version of Papervision3D, PapervisionX.