Tom Hammersley's Graphics and Games Development Blog: March 2011

Thursday, 24 March 2011

Laziness vs Strictness in Haskell

I've just taken over a minute of execution time off my raytracer.

One of Haskell's biggest language features is the lazy evaluation system. By default, Haskell evaluates nearly all expressions lazily, as needed. This is one of the features that drew me to Haskell, and indeed it seems a reasonably apt way of writing a raytracer. Raytracing often consists of selectively traversing different parts of large databases for different parts of the scene. Many acceleration schemes involve simple mechanisms to lazily evaluate results.

Laziness, however, does pose an overhead. Haskell introduces a "thunk" for every lazy expression. A thunk describes how to calculate a deferred expression when required. For some expressions, a thunk is a potential time saver: you can defer work until needed, calculate the required value, then re-use it. For other things, such as simple arithmetic operations like vector addition, a thunk is an overhead often comparable to the original operation.

Haskell offers a couple of ways to avoid thunks and therefore lazy evalation. The main two I use are the BangPatterns extension to explicitly mark something as strictly (ie, non-lazily) evaluated, and seq.

The BangPatterns extension allows you to mark expressions that you want to be explicitly evaluated with a !. So, for a Vector, you would write Vector !Float !Float !Float !Float. This way, you force anything you assign to a vector to be strictly evaluated and stored in the vector.

Seq is used to strictly force the evaluation of a function. Seq has the signature of a -> b -> a. It therefore takes two parameters, and returns the second.

An interesting example of where lazy evaluation proven to be an overheard, rather than a win was my light accumulation function. Given a point in space to be shaded, this accumulates the contribution of every light source. Now, the inputs to this function are going to be different every time this function is called as every ray intersection is going to be unique. Thunks will therefore present an additional overhead.

I rewrote my function to use seq. I use seq to strictly evaluate the application of this function, then process the remainder of the list:

accumulateLight :: [Light] -> Colour -> SceneGraph -> Vector -> Vector -> Material -> Vector -> Colour
accumulateLight (x:xs) acc sceneGraph shadePos normal objMaterial viewDirection = let result = acc + (applyLight sceneGraph shadePos normal objMaterial viewDirection x)
in seq result (accumulateLight xs result sceneGraph shadePos normal objMaterial viewDirection)
accumulateLight [] acc _ _ _ _ _ = acc

This change alone saved over a minute of execution time! Since a raytracer generally consists of many similar kinds of loops, I am now searching for wider application of this technique. It's a very difficult call to make with a limited data set. Some functions may be faster if strictly evaluated with a small dataset, but may perform better if evaluated lazily with a larger dataset.

Profiling will guide me.

Wednesday, 23 March 2011

Pragmatism vs Idealism (DOD vs OOP)

So, it seems no blog these days is complete without a yet another stab at the DOD vs OOP argument.

Here's my take.

I could have picked any number of dichotomy based titles for this post. Objective vs Subjective. Pragmatism vs Idealism. Design vs Performance. PC vs Console. There is a tendency to view DOD and OOP as somewhat polar extremes that you are either for, or against. Personally, I see merit in both camps. It is something of a false dichotomy. I see value in the ideals of OOP and DOD. Equally, I see shortcomings in both camps. For me, it's not an either/or, it's a question of how you blend them together.

For me, the real key issue is choosing what approach to adopt for a given task. This is where the real challenge lies.

The arguments used to support OOP are essentially based on intangible qualities. Design. Re-use. Extensibility. Maintainability. Elegance. Whilst valuable, these are generally somewhat subjective qualities. What constitutes good design is based on widely held best practices, and those practices are based on a consensus of constantly evolving expert opinion. It's very hard to measure these things explicitly. It's very hard to say how much "better" a more OOP centric approach can make a piece of code.

The arguments for DOD, by contrast, are based on real-world, objective data: performance. You can directly measure performance, and directly state the improvement that a DOD approach has yielded for a given task. Similarly, you can measure the cost of an OOP based approach in terms of lost performance. Performance has a useful, marginal benefit in a software project. Other qualities have their benefits but are frequently less directly felt.

You therefore end up with a challenging situation. How can the intangible qualities that support OOP argue against the real-world, objective data supporting DOD? How can we support OOP over DOD when it is based on subjective preferences, leading to design decisions that are somewhat arbitrary. Especially when you are trying to a ship a console game!

As I see it, the performance data that supports DOD should be supporting both arguments. The current generation of consoles in many ways are indicative of what future computing platforms will generally look like. Many cores. Slow memory. Long, simple instruction pipelines.

OOP has its positive qualities, but it is important to view it in context: it largely emerged as a means of improving the structure of large software systems by breaking software into singular, encapsulated objects. Whilst these concerns are significant, there are other, equally valid ways of addressing those concerns than OOP alone. And with the continuing, accelerating trend of parallel hardware, OOP alone cannot be positioned as the primary concern in designing software systems.

For me, my primary concern in software design is parallelism first, DOD second and OOP third.

Tuesday, 22 March 2011

Sphere Tree Construction and Traversal in Haskell

Finally. I've finally finished writing and debugging my sphere tree construction code in Haskell.

It should have been so easy. Tree structures are very easily recursively defined which is ideal for functional languages like Haskell. Take your data, split it up, and build the children by recursion. Simple. Only it was anything but...

The logic of the task was simple enough. The main problem was debugging. This is the sort of task that is prone to simple mistakes that cause time-consuming debugging problems. When working in an imperative language, you can easily put down a breakpoint and inspect your data, or if you recur too much the stack blows up, or you can step through your code and see why you're getting stuck in loops. These strategies don't map well to Haskell.

What does work in Haskell is to start small, build up gradually and constantly verify your results. You end up with a development cycle that is fundamentally very similar to test driven development. You want to prove your code one little function at a time by feeding it controlled inputs and inspecting the outputs. In Haskell, your primary debugger is your brain - you need to think very carefully through your code and how your express your algorithms. Functional languages reduce the scope of creating simple bugs due to side effects, and Haskell in particular eliminates a lot of plumbing and housekeeping code. What's left is purely your algorithm - inspect it carefully, that is where your problem lies.

Bounding Volume Hierarchies

I considered a few approaches for my bounding volume hierarchy, and for now, I've settled on a fairly generic implementation of bounding volume hierarchies that currently is configured as a sphere tree.

Each tree node can optionally (Maybe) hold an object. Each node has a sphere, but that could later be generalised. The children are defined as a list, allowing me to switch using a quadtree, kdtree or octree type approach.

(Note: The code below is best copied and viewed in a real text editor. Also, I am not yet a Haskell expert, and there may be many superior ways to achieve these results. My purpose in writing is to share my experiences as a learner with other learners.)

data SphereTreeNode = SphereTreeNode { object :: Maybe Object, children :: [SphereTreeNode], boundingRadius :: Float, boundingCentre :: Vector } deriving (Show, Read)

Sphere Tree Construction

The tree itself is constructed using a fairly simple piece of Haskell:

-- Build up a sphere tree
buildSphereTree :: ([Object] -> [[Object]]) -> [Object] -> SphereTreeNode
buildSphereTree _ (obj : []) = SphereTreeNode (Just obj) [] nodeRadius nodeCentre
   where
   nodeCentre = calculateMeanPosition (obj:[])
   nodeRadius = calculateBoundingRadius (obj:[]) nodeCentre
buildSphereTree builder (obj:objs)
   | length (obj:objs) == 1 = error "Should have been handled by a different pattern"
   | length (obj:objs) == 0 = error "Should not have zero objects"
   | otherwise = SphereTreeNode Nothing nodeChildren nodeRadius nodeCentre
   where
   nodeCentre = calculateMeanPosition (obj:objs)
   nodeRadius = calculateBoundingRadius (obj:objs) nodeCentre
   nodeChildren = map (buildSphereTree builder) (builder (obj:objs))
buildSphereTree _ [] = error "Should not hit this pattern for buildSphereTree"

-- Find the mean of a collection of objects

calculateMeanPosition' :: [Object] -> Vector -> Vector

calculateMeanPosition' (obj : objects) acc = calculateMeanPosition' objects acc + (getCentre obj)

calculateMeanPosition' [] acc = acc

calculateMeanPosition :: [Object] -> Vector

calculateMeanPosition objects = setWTo1 ((calculateMeanPosition' objects zeroVector) fromIntegral (length objects))

-- Find the overall bounding radius of a list of objects

calculateBoundingRadius :: [Object] -> Vector -> Float

calculateBoundingRadius objs centre = foldr Prelude.max 0 (map (\obj -> shapeBoundingRadius (shape obj) (transform obj) centre) objs)

This object simply recursively builds a tree until it encounters an object list of size 1.

I've deliberately exercised a couple of Haskell idioms here. I've used another tail-recursive loop to calculate the mean centre of the objects, and a map/fold pair to calculate the sphere's bounding radius.

The most interesting part is the first parameter to buildSphereTree. I pass in a function of type ([Object] -> [[Object]]). This function is responsible for dividing the object list into a number of lists, each of which is the object list used to build a new child node. This trick allows me to abstract out the specific algorithm for building the tree into a user-supplied function.

In my current code, I'm using a simple KD-type approach:

-- Generate a plane to split the objects along

makeSplittingPlane :: AABB -> (Vector, Float)

makeSplittingPlane (boxMin, boxMax) = case largestAxis (boxMax - boxMin) of

                                        0 -> (xaxis, -(vecX midPoint))

                                        1 -> (yaxis, -(vecY midPoint))

                                        2 -> (zaxis, -(vecZ midPoint))

                                        _ -> error "Undefined value"

    where

      midPoint = (boxMin + boxMax) <*> 0.5

-- Make children using a kd tree

generateSceneGraphUsingKDTree :: [Object] -> [[Object]]

generateSceneGraphUsingKDTree objs = leftObjects : rightObjects : []

    where

      objBox = objectListBoundingBox objs

      (planeNormal, planeDist) = makeSplittingPlane objBox

      (leftObjects, rightObjects) = partition (\obj -> planeDist + (dot3 planeNormal $ getCentre obj) > 0) objs

We simply make a splitting plane and split the list of objects according to what side of the plane their centre lies: [[Object], [Object]].

I could easily replace this function with an alternative strategy. Or even, make a function that returns a function based on the characteristics of the presented list...

Bounding Volume Hierarchy Intersection

Now this is the real meat of the implementation. The function again takes the form of a tail-recursive loop. The code is essentially similar to implementing a recursive function using a software stack in C. The tail recursion is used to make a simple loop-type recursion rather than a true state-accumulation call-type recursion.

The code is somewhat complex as several optimisations have been added.

The code is passed a list of nodes to process. We process the first node in the list, and recur with the remainder of the list - and possibly, some newly added nodes. The code seeks to find and maintain the closest-intersecting object.

The code constructs a tuple of three values - a result for this object, a list of additional nodes to process and a ray. If an object or bounding volume is hit, this tuples holds new values. If no intersection is found, we simply continue with the existing state passed into the function.

The code first considers whether the given node has any children; if it has none, it only makes sense to test the object that the node should hold. We report an error if the node has no object (a node without children or objects is useless!).

If the node does have children, it holds a subtree we wish to accept or reject. We first test the bounding volume. If this is rejected, we simply retain the current results of the loop with no additional nodes to process. If the bounding volume is intersected, then we will at least recur to the children nodes. If the node contains an object and it is intersected, we update our current intersection results and shorten our ray.

Finally, we pattern match the empty array case to simply return the current results.

Here is the code:

intersectSphereTree :: [SphereTreeNode] -> Ray -> Maybe (Object, Float, Int) -> Maybe (Object, Float, Int)

intersectSphereTree (node:nodes) ray currentHit = intersectSphereTree (newNodeList ++ nodes) newRay thisResult

where

-- Intersect the ray with the bounding volume of this node

(thisResult, newNodeList, newRay) = case children node of

-- If the node has no children, don't bother with it's bounding volume and just check the object (if it has one)

[] -> case object node of

Nothing -> error "A node with no children should hold an object"

Just obj -> case shapeIntersect (shape obj) ray (transform obj) of

-- Didn't hit the object. Retain the current hit, and continue with remaining nodes on the list

Nothing -> (currentHit, [], ray)

-- We did hit this object. Update the intersection, and continue with remaining nodes on the list

Just (objHitDistance, objHitId) -> (Just (obj, objHitDistance, objHitId), [], shortenRay ray objHitDistance)

-- We have children. In this case it makes sense to test our bounding volume

nodeChildren -> case shapeIntersect (Sphere (boundingRadius node)) ray (translationMatrix' (boundingCentre node)) of -- (make a sphere centred at the object's transform matrix with given radius)

-- If we do not find an intersection, we do not update the results and we offer no further nodes to be traversed, thus skipping this subtree

Nothing -> (currentHit, [], ray)

-- If we do find an intersection against the bounding volume, then we try again against the actual object (if present)

Just (_, _) -> case object node of

Nothing -> (currentHit, nodeChildren, ray) -- No object; just pass to the children

Just obj -> case shapeIntersect (shape obj) ray (transform obj) of

-- Didn't hit the object. Retain the current hit, but offer up the children of the node as we hit the bounding volume

Nothing -> (currentHit, nodeChildren, ray)

-- We did hit this object. Update the intersection, and continue with the bounding volume's children

Just (objHitDistance, objHitId) -> (Just (obj, objHitDistance, objHitId), nodeChildren, shortenRay ray objHitDistance)

intersectSphereTree [] _ currentHit = currentHit

The striking thing about this code is that it steadily marches to the right of the screen, with each indentation being a new test of success or failure. Ideally, this code could be rewritten as a monad.

What About Infinite Objects?

They're simply "filter"ed out at the time of the sphere tree construction, and added to a separate list. It makes little sense to include an infinite object into any bounding volume hierarchy.

Results

The code continues to render my test scene as expected. Disappointingly, it is currently a little slower; however, this is not due to the scene graph code. I have recently rewritten my vector classes to be instances of the Num typeclass, so that I can simply write a + b, rather than coining new operators such as a <+> b. I haven't yet fully optimised my new code.

Other Observations

As I get further into Haskell, I'm finding myself increasingly thinking in terms of Lambda calculus. I no longer find myself thinking of what iterative steps I need to do to complete an operation, rather I find myself thinking in terms of the transformations of the underlying data and accompanying functions.

I find this new perspective very helpful when returning to other languages. It gives you a somewhat stream-processing-like perspective and thought process when facing standard C++ code.

What's Next?

The main focus now is to diagnose, understand and resolve the issues with my new typeclass-based fundamental data types such as Vector and Colour. I want to get it below the ~70 seconds it was previously taking. Following that, it is now time to build a parser and load some standard test scenes that will allow me to develop and optimise the ray tracer further.

Thursday, 3 March 2011

A worked Haskell example

I thought it might be interesting to do a worked example of some work-in-progress Haskell to discuss what I like about it.

The case in question is the code that traverses the scene graph and intersects a ray.

Now, I'm putting the ultimate efficiency and DOD type concerns aside for a moment. I grant you there may be more efficient/better ways of doing this, it's just for discussion purposes.

Disclaimer: I am not (yet) a Haskell expert. There are probably better ways of doing this or writing it in Haskell. I write for the reasons of discussing what I've learnt about trying to write a practical application in Haskell. If there are better ways of doing it, I'd love to know, as I'm still learning.

So, first up, the code:

intersectSceneGraph' :: [SceneGraphNode] -> Ray -> Maybe (Object, Position, Int) -> Maybe (Object, Position, Int)
intersectSceneGraph' (node:nodes) ray currentHit = intersectSceneGraph' (newNodeList ++ nodes) ray thisResult
   where
   (thisResult, newNodeList) = case shapeIntersect (Sphere $ boundingRadius node) ray (transform (object node)) of
   Nothing -> (currentHit, [])
   Just (_, _) -> case shapeIntersect nodeObjectShape ray nodeObjectTransform of
   Nothing -> (currentHit, children node)
   Just (objHitDistance, objHitId) -> (Just (object node, pointAlongRay ray objHitDistance, objHitId), children node)
   where
   nodeObjectShape = shape $ object node
   nodeObjectTransform = transform $ object node
intersectSceneGraph' [] _ currentHit = currentHit

intersectSceneGraph :: SceneGraphNode -> Ray -> Maybe (Object, Position, Int)
intersectSceneGraph node ray = intersectSceneGraph' (node : []) ray Nothing

(If the code does not display clearer in your browser, cut and paste to a text editor)

This function is split into two parts. intersectSceneGraph is essentially a front end, a wrapper for intersectSceneGraph'. intersectSceneGraph' traverses a list of scene graph nodes, tests the ray against the bounding volume, and if it intersets it, it then intersects against the contained object and recurs to the children.

So, how does it traverse a tree?

The Haskell code is very similar to C code that traverses a tree using a software stack.

The operation is defined in a tail recursive fashion. I process one element of the list, and then recur to the remainder of the list. If the node in question is a new closest hit, this gets passed through to the next recursion as the new current state. If the bounding volume is intersected, then I add the children of the node to the list. If it is not intersected, I add nothing, an empty list.

This business of passing through state seems a little odd. Why would one instance of a function pass its internal state to another? This is a Haskell idiom. By passing the state from one call to another, as an "accumulator" parameter, it allows the compiler to eliminate any intermediate state on the stack and implement the recursion as a goto loop.

And all of this is lazily evaluated.

In pseudo-C, the code might look something like this:

closestHit = null;

while((topOfStack = stack.pop()) != NULL)
{

if(intersect(ray, topOfStack->bounding volume))

{

if(intersect(ray, topOfStack->object))

{

closestHit = this one

}

push(stack, topOfStack->children);

}
}

I think this shows the elegance of Haskell. Think of what is not there. There are no braces to explain scope to the compiler: you just indent it. There are no incidental variables describing the state of loop variables and so on. The compiler automatically pattern matches and dispatches to appropriate special cases instead. No pointer walks. No null checks. No explicit shuffling of data from A to B. You have a closer, more direct description of your algorithm, and the compiler fills in the gubbins.

Of course, there is a cost to all of this. By sacrificing that lower-level control, you lose out on the ultimate optimisability of the code. Still, the objective here is to find the best way to learn and use Haskell, rather than write the ultimate real-time raytracer in SIMD assembly! :-)

Tuesday, 1 March 2011

Brook's Law, Extended

Brook's law states that adding manpower to a late project makes it later.

The increased communication and management overhead of the new staff slows down the incumbent project members.

I'm beginning to think this law extends before t=0 and into the recruitment process, too!