Sunday 3 April 2011

How to Parallelise A Haskell Raytracer

One of the big attractions of functional languages is that they're theoretically very easily to parallelise as they have no problematic side-effects You only describe how to perform a calculation, and never touch on the details of moving data around the machine. The run-time is then free to process data, given your functional expressions, in parallel.

I've been experimenting with Haskell's basic parallelism primitives, par, and pseq. These primitives trigger ("spark") evaluation of expressions in parallel - but only if the runtime deems it necessary.

After many failed or mildly successful experiments, I've finally managed to get a speedup. In short, I stopped trying to reinvent the wheel myself. I simply used parListChunk:


rayTraceImage :: Camera -> Int -> Int -> SceneGraph -> [Light] -> [Colour]
rayTraceImage camera renderWidth renderHeight sceneGraph lights = map (clamp . tracePixel eyePosition sceneGraph lights) rayDirections `using` parListChunk 256 rseq
    where !rayDirections = [makeRayDirection renderWidth renderHeight camera (x, y) | y <- [0..(renderHeight - 1)], x <- [0..(renderWidth - 1)]]
          !eyePosition = Camera.position camera

Here, we break the list of rays to be traced into data parallel chunks of 256, each free to be evaluated by a different thread. Each of these chunks is processed using map. Map applies the raytracing function to convert a ray direction into a colour, and we clamp the result.

And that's it.

Running it on two threads on my dual-core MacBook Pro halves the execution time.

What's nice is that the core 'map' expression is separated from the details of parallelisation. We've separated the concepts of raytracing and its parallel decomposition into two separate concerns. We are therefore free to vary either without intertwining their concerns.

Not a critical section, lock-free queue or atomic operation in sight!

Clearly, the nature of the functional approach surrenders a lot of low-level implementation details to your platform's runtime. However, it is more frequently the case that when you require such low-level intervention, you are working around deficiencies in your run-time platform, rather than dealing with fundamental algorithm details. Parallelising many algorithms follows established patterns.

It is also worth noting that the raytracer code is currently pretty straightforward code. I've not implemented any particular tricky optimisations, and certainly none of the ideas in work such as Ingo Wald's parallel raytracing PhD thesis. Not yet, anyway...

2 comments:

  1. A massively parallel GPU-based raytracer (with GPL code):

    http://users.softlab.ece.ntua.gr/~ttsiod/cudarenderer-BVH.html

    ReplyDelete