Well, I've managed to knock off a further half minute from the execution time. The execution time now stands at 3.5 minutes down from just over 4.
I've achieved this with two main methods: strictness, and rewriting loops to be tail recursive.
By default, Haskell is a lazy language. Haskell defers evaluation of an expression until it is needed by creating a thunk. Whilst thunks may save work, thunks are not free. Thunks are an example of a work-saving technique that do not provide a net benefit for very small pieces of work or data.
Thunks can be eliminated by marking expressions as strictly evaluated. This technique forces an expression to be evaluated when it is encountered, rather than deferred. The Haskell compiler automatically treats some expressions as strict, but a programmer can force an expression to be strict using the BangPatterns language extension. You simply prefix the variables that you wish to be strictly evaluated using a pling (!).
I went through and marked up all of the trivial expressions and fields in the raytracer code. Things like the xyzw fields of a vector, rgba fields of a colour, field-of-views, radii, that sort of thing. Eliminating those thunks provided around half of the total savings.
More information on strictness here: http://www.haskell.org/haskellwiki/Performance/Strictness
Ensuring that loops were properly tail recursive provided the second half of the savings. Haskell does not provide traditional for() loops to iterate over arrays and lists; instead, recursion must be used. This may introduce extra space and time overhead due to the state accumulated at each function call.
Tail recursive functions eliminate this overhead as they allow the compiler to use the equivalent of a goto rather than a function call to traverse the next item of a list. This essentially converts the Haskell code into something resembling a traditional for loop.
More information here:
http://www.haskell.org/haskellwiki/Tail_recursion
http://book.realworldhaskell.org/read/profiling-and-optimization.html
I applied this optimisation to the light accumulation, object intersection and ray tracing loops. I also taken the opportunity to add a few simple optimisations to the main object intersection loop.
So, all in all, a few modest gains. What's becoming very apparent is that to seriously optimise Haskell code I need to learn Core. Core is Haskell's intermediate compilation language. 3.5 minutes is still a long time to ray-trace a very simple scene and I'm sure there must be a lot of win left in there. I could brute-force it in other languages in far less! Hopefully the Core will show some major overhead in the low level vector, colour, lighting and intersection code.
Games programming, graphics programming, and general software development
Sunday, 27 February 2011
Thursday, 24 February 2011
Baby Steps in Haskell Optimisation
So, now I'm doing depth of field with 64 rays per pixel, my runtime has shot up from a handful of seconds to 5 minutes or so. Time for some Haskell optimisation.
I've been working through the often-cited Haskell optimisation strategies.
"Use Doubles instead of Floats". This one seemed a little unintuitive, but reasonable. On the one hand, many modern processors natively evaluate floating point expressions as doubles, and the Haskell compiler writes claim the Double path is a faster path. On the other hand, it's a larger data footprint. I switched over all of my floats to doubles, and that doubled the execution time. So that one turned out to be an anti-optimisation, in practice.
"Add missing type signatures". The rationale here is that if you give a type signature, you offer the compiler the opportunity to write more specialised code. This one was worth a good 20 seconds of execution time.
"Explictly export only certain symbols". Instead of making everything in your module publically visible, only export certain things. This allows the Haskell compiler the opportunity and freedom to implement optimisations to the now-hidden code. This one turned out to be quite a win, a good 30-40 seconds.
So, with this set of optimisations, I've taken it from 5 minutes down to 4 minutes. That's still a hell of a lot for a compiled program rendering a 640x480 image of a few spheres and planes. I get the (admittedly speculative) gut feeling that the code is spending a lot more time in the Haskell runtime than executing ray tracing code. So, my next set of optimisation avenues are:
* Unboxing
* Strictness
* LLVM
* Parallelism
* Manual CSE
I've been working through the often-cited Haskell optimisation strategies.
"Use Doubles instead of Floats". This one seemed a little unintuitive, but reasonable. On the one hand, many modern processors natively evaluate floating point expressions as doubles, and the Haskell compiler writes claim the Double path is a faster path. On the other hand, it's a larger data footprint. I switched over all of my floats to doubles, and that doubled the execution time. So that one turned out to be an anti-optimisation, in practice.
"Add missing type signatures". The rationale here is that if you give a type signature, you offer the compiler the opportunity to write more specialised code. This one was worth a good 20 seconds of execution time.
"Explictly export only certain symbols". Instead of making everything in your module publically visible, only export certain things. This allows the Haskell compiler the opportunity and freedom to implement optimisations to the now-hidden code. This one turned out to be quite a win, a good 30-40 seconds.
So, with this set of optimisations, I've taken it from 5 minutes down to 4 minutes. That's still a hell of a lot for a compiled program rendering a 640x480 image of a few spheres and planes. I get the (admittedly speculative) gut feeling that the code is spending a lot more time in the Haskell runtime than executing ray tracing code. So, my next set of optimisation avenues are:
* Unboxing
* Strictness
* LLVM
* Parallelism
* Manual CSE
Wednesday, 23 February 2011
Depth of Field
So, I've now got depth of field working!
It was pretty straightforward. I just used a list comprehension to generate a list of points to jitter the rays with, and then it took a lot of fiddling to get the focusing working correctly.
I've also done various refactorings to my code.
All in all, it added roughly an extra 15-20 lines of code to the tracer. And that's without writing Perl-style gibberish.
Predictably enough, though, the tracer is now running much slower - about a minute for this image on my 2008 MBP. So, the next job will be learning how to optimise code in Haskell, and throwing some threading (back) in there.
It was pretty straightforward. I just used a list comprehension to generate a list of points to jitter the rays with, and then it took a lot of fiddling to get the focusing working correctly.
I've also done various refactorings to my code.
All in all, it added roughly an extra 15-20 lines of code to the tracer. And that's without writing Perl-style gibberish.
Predictably enough, though, the tracer is now running much slower - about a minute for this image on my 2008 MBP. So, the next job will be learning how to optimise code in Haskell, and throwing some threading (back) in there.
Friday, 18 February 2011
Pet Project Resurrected
So, about 14 months or so ago, I spent a few weeks learning Haskell over Christmas. Then, I had to start work in earnest on my MSc final-year project, so the Haskell hacking went on the back burner.
Now that I'm in my last few weeks of the MSc, I've picked up Haskell again. Last time around I'd knocked myself up a basic little raytracer. I'm now starting up that project again. I want to work on some off-line rendering techniques for a change of pace from the day job.
So far, I've got the basics: reflection, refraction, shadows, materials and shaders (100% Haskell including shaders):
So what's next? I reckon:
* Distributed ray tracing
* Anti-aliasing
* More light types and lighting model types
That should be a good set of features to work on whilst I get my Haskell back up to speed. Once I'm back up to speed, I can start to look at meatier tasks like an asset pipeline or spatial partitioning.
I did have it running in parallel last time round, but that's all changed in Haskell in the last year or so, so I'll have to revisit that.
Now that I'm in my last few weeks of the MSc, I've picked up Haskell again. Last time around I'd knocked myself up a basic little raytracer. I'm now starting up that project again. I want to work on some off-line rendering techniques for a change of pace from the day job.
So far, I've got the basics: reflection, refraction, shadows, materials and shaders (100% Haskell including shaders):
So what's next? I reckon:
* Distributed ray tracing
* Anti-aliasing
* More light types and lighting model types
That should be a good set of features to work on whilst I get my Haskell back up to speed. Once I'm back up to speed, I can start to look at meatier tasks like an asset pipeline or spatial partitioning.
I did have it running in parallel last time round, but that's all changed in Haskell in the last year or so, so I'll have to revisit that.
Sparse Voxel Octrees meet Marching Cubes
Some speculation here. I've not tried this idea out yet.
I've been doing a little reading up on Sparse Voxel Octrees lately, and an idea in Samuli Laine's papers particularly interested me. In these papers, "contours" are used to add shape to the basic octree voxels to enhance detail without extra subdivision.
I quite like this idea. I've been thinking of ways to try to extend the contour idea.
How about using a marching-cubes type algorithm to provide some more detailed geometry at the node or leaf level, based on the existence of non-existence of neighbouring octree cells?
You could quickly construct a bit mask based on the existence of neighbouring cells, use that to look up in a table providing several specific pieces of geometry for each case. This could add some extra detail and shape to each octree voxel.
I've been doing a little reading up on Sparse Voxel Octrees lately, and an idea in Samuli Laine's papers particularly interested me. In these papers, "contours" are used to add shape to the basic octree voxels to enhance detail without extra subdivision.
I quite like this idea. I've been thinking of ways to try to extend the contour idea.
How about using a marching-cubes type algorithm to provide some more detailed geometry at the node or leaf level, based on the existence of non-existence of neighbouring octree cells?
You could quickly construct a bit mask based on the existence of neighbouring cells, use that to look up in a table providing several specific pieces of geometry for each case. This could add some extra detail and shape to each octree voxel.
Polymorphism
Polymorphism is one of the object oriented paradigms that leads to pain when optimising.
Why?
Virtual.
Polymorphism in itself isn't a bad thing. It's a useful design mechanism and has its place. The virtual function call adds a level of abstraction that makes it difficult to port to SPU or GPU architectures and adds a barrier to many optimisation techniques.
A common example is using polymorphism and virtual functions to process a list of heterogenous types with a common base class.
This is bad because:
(a) You're adding the cost of virtual function calls to every object
(b) You're dispatching to different pieces of code, in a potentially incoherent order, causing I$ thrash.
(c) You often end up with oversized objects that carry dead code and data around, again hurting cache efficiency
(d) It's harder to port the code to the SPU (or GPU)
I've been exploring an alternative way of structuring this approach. I've retained a polymorphic external interface for registering objects of different types in a list. However, rather than have a single add() function that takes the base type, I've explicitly made several overloaded functions that take the derived types. Each of these functions adds that object to a list of objects of the same specific derived type. So, instead of having one big heterogenous list, internally I have several type-specific homogenous lists.
This way, I eliminate all virtual calls in my processing-side code. I can now process all of the objects of type X, Y and Z. I can explicitly call X::process() in a list, Y::process(), Z::process() and so on. This makes the code much easier to optimise and much easier to port to the SPU because you no longer need to know about the vtable. I now have much better I$ coherency.
Who said we must have one list, after all? Why does the contained object have to pay the cost of the abstraction mechanism? Doesn't it make more sense for the containing object to pay the cost? And a cheaper cost at that! Provided the set of types you are dealing with remains reasonable, maintaining multiple separate lists can be an easy way to enhance efficiency and make your code easier to port to SPU.
Why?
Virtual.
Polymorphism in itself isn't a bad thing. It's a useful design mechanism and has its place. The virtual function call adds a level of abstraction that makes it difficult to port to SPU or GPU architectures and adds a barrier to many optimisation techniques.
A common example is using polymorphism and virtual functions to process a list of heterogenous types with a common base class.
This is bad because:
(a) You're adding the cost of virtual function calls to every object
(b) You're dispatching to different pieces of code, in a potentially incoherent order, causing I$ thrash.
(c) You often end up with oversized objects that carry dead code and data around, again hurting cache efficiency
(d) It's harder to port the code to the SPU (or GPU)
I've been exploring an alternative way of structuring this approach. I've retained a polymorphic external interface for registering objects of different types in a list. However, rather than have a single add() function that takes the base type, I've explicitly made several overloaded functions that take the derived types. Each of these functions adds that object to a list of objects of the same specific derived type. So, instead of having one big heterogenous list, internally I have several type-specific homogenous lists.
This way, I eliminate all virtual calls in my processing-side code. I can now process all of the objects of type X, Y and Z. I can explicitly call X::process() in a list, Y::process(), Z::process() and so on. This makes the code much easier to optimise and much easier to port to the SPU because you no longer need to know about the vtable. I now have much better I$ coherency.
Who said we must have one list, after all? Why does the contained object have to pay the cost of the abstraction mechanism? Doesn't it make more sense for the containing object to pay the cost? And a cheaper cost at that! Provided the set of types you are dealing with remains reasonable, maintaining multiple separate lists can be an easy way to enhance efficiency and make your code easier to port to SPU.
Wednesday, 28 April 2010
My MSc Research Project
The games industry has had a long-standing difficulty in successfully delivering projects to market on-time and on-cost.
The industry has tried many different methods to improve this problem; yet still projects slip, are cancelled, or require significant amounts of crunch to complete them. We have tried, with varying success, to import methodologies from other industries. Yet when placed in the context of the games industry, something is still somehow lacking.
My hypothesis is that the industry does not have a project management methodology suited to its nature. This industry is a creative entertainment industry and has different needs to other sectors of the software industry. I believe that creativity is a significant factor in the development of videogames, and that it is a factor that to-date has not been explicitly treated by any project management methodology. This is a significant shortcoming for an industry that is built on creativity. I believe that much of the reason for many development problems is that we are trying to manage a creative process with tools inappropriate to that task. Creative processes are different, and need to be managed differently.
My research project will investigate the effect of creativity on various development tasks, both routine and highly creative, in a real world project. This will be the first study that provides an investigation and understanding of what role creativity plays in videogames programming. These results will be critically analysed to consider the question of how best can we manage creativity in videogames programming?
If all goes well, this should be the first step in establishing a new direction of research and practice in the management of videogames projects.
The industry has tried many different methods to improve this problem; yet still projects slip, are cancelled, or require significant amounts of crunch to complete them. We have tried, with varying success, to import methodologies from other industries. Yet when placed in the context of the games industry, something is still somehow lacking.
My hypothesis is that the industry does not have a project management methodology suited to its nature. This industry is a creative entertainment industry and has different needs to other sectors of the software industry. I believe that creativity is a significant factor in the development of videogames, and that it is a factor that to-date has not been explicitly treated by any project management methodology. This is a significant shortcoming for an industry that is built on creativity. I believe that much of the reason for many development problems is that we are trying to manage a creative process with tools inappropriate to that task. Creative processes are different, and need to be managed differently.
My research project will investigate the effect of creativity on various development tasks, both routine and highly creative, in a real world project. This will be the first study that provides an investigation and understanding of what role creativity plays in videogames programming. These results will be critically analysed to consider the question of how best can we manage creativity in videogames programming?
If all goes well, this should be the first step in establishing a new direction of research and practice in the management of videogames projects.
Subscribe to:
Posts (Atom)