Well, I've managed to knock off a further half minute from the execution time. The execution time now stands at 3.5 minutes down from just over 4.
I've achieved this with two main methods: strictness, and rewriting loops to be tail recursive.
By default, Haskell is a lazy language. Haskell defers evaluation of an expression until it is needed by creating a thunk. Whilst thunks may save work, thunks are not free. Thunks are an example of a work-saving technique that do not provide a net benefit for very small pieces of work or data.
Thunks can be eliminated by marking expressions as strictly evaluated. This technique forces an expression to be evaluated when it is encountered, rather than deferred. The Haskell compiler automatically treats some expressions as strict, but a programmer can force an expression to be strict using the BangPatterns language extension. You simply prefix the variables that you wish to be strictly evaluated using a pling (!).
I went through and marked up all of the trivial expressions and fields in the raytracer code. Things like the xyzw fields of a vector, rgba fields of a colour, field-of-views, radii, that sort of thing. Eliminating those thunks provided around half of the total savings.
More information on strictness here: http://www.haskell.org/haskellwiki/Performance/Strictness
Ensuring that loops were properly tail recursive provided the second half of the savings. Haskell does not provide traditional for() loops to iterate over arrays and lists; instead, recursion must be used. This may introduce extra space and time overhead due to the state accumulated at each function call.
Tail recursive functions eliminate this overhead as they allow the compiler to use the equivalent of a goto rather than a function call to traverse the next item of a list. This essentially converts the Haskell code into something resembling a traditional for loop.
More information here:
I applied this optimisation to the light accumulation, object intersection and ray tracing loops. I also taken the opportunity to add a few simple optimisations to the main object intersection loop.
So, all in all, a few modest gains. What's becoming very apparent is that to seriously optimise Haskell code I need to learn Core. Core is Haskell's intermediate compilation language. 3.5 minutes is still a long time to ray-trace a very simple scene and I'm sure there must be a lot of win left in there. I could brute-force it in other languages in far less! Hopefully the Core will show some major overhead in the low level vector, colour, lighting and intersection code.