Friday, 2 November 2012

Job Graphs in C++ 11

Introduction

I've recently been experimenting with a lot of the new features in C++ 11, purely for discovery and experimentation. One major new feature are the threading libraries. As using is the best way of learning, I set upon the task of writing a currently-fashionable task/job-graph system using these new features.

I'm not promoting this algorithm as the one true approach for parallelism, as there are many other valuable algorithms that are much more suitable in certain occasions. This one seems a useful, generic solution that fits many cases fairly well. In the process, I found a few interesting quirks during implementation that I thought may be interesting for others.

Job Graphs

Job graphs are currently finding wide favour as an approach to the challenge of building a flexible architecture for multithreaded processing. In this scheme, jobs are stored in a graph structure. The edges of the graph are used to represent ordering and data dependencies between jobs. Assuming the graph is structured correctly, this ensures that work is processed in the correct order to ensure the correct and timely flow of data. Nodes that do not share a data dependency may be executed concurrently. The programmer shoulders the responsibility of structuring the graph correctly, and ensuring any conflicts between resources are either resolved via the graph, or using some locking mechanism.

Typically, the jobs are pushed into some kind of queue and distributed to a thread pool. Ideally you would also use work-stealing for maximum efficiency.

The goal is to use this graph structure to allow non-dependent work to execute concurrently, saturating all hardware threads. It is a very simplistic idea, yet at its most it is NP-hard to solve.

C++ 11 Threading Overview

The C++ 11 threading library is really quite a lightweight, spare API. Broadly, it offers the facilities to:
  • Create and manage threads in a cross platform fashion.
  • Synchronise activity between these threads using simple primitives such as mutexes, condition variables and promises/futures.
  • Package up tasks to be executed on a thread.

Implementation

The implementation of a job graph requires relatively few components:
  • A job class
  • A job graph container
  • A job pool
  • A thread pool

Job Class

The job class is fairly straightforward. It serves two purposes: to package up some functionality to be executed, and have knowledge of precursor jobs and dependent jobs (this concept is sometimes described as continuations).

The major consideration in the design of the job class is the mechanism used to sequence their operation. Ideally, you want to use some kind of OS threading primitive as opposed to some kind of user-space mechanism. The OS is in a far better position to manage threads as it has clear information of the state and dependencies of all of them, and can wake and sleep threads with minimal context switches. User space code often ends up with some kind of inefficient wake-predicate-sleep polling pattern causing a lot of wasted time and context switches.

I therefore chose to use a C++ 11 thread library primitive: std::promise and std::future. A promise provides the ability to calculate some value asynchronously, in a deferred context, in the future. It represents a contract for some piece of code to provide a value based on some inputs. A future is the mechanism used to retrieve that value. The promise is the producer, the future is the client.

The promise and future abstraction provides  a good mechanism to couple jobs together. A job can provide a "future" to each of its dependent jobs. The dependent job uses this future to wait for the job's completion. The future could pass data, or it could solely be used to sequence computation.

A second feature of C++ 11 that proves useful for the job class is std::function and lambda functions. An std::function is simply a wrapper for some callable target. My job class contains one of these to hold the job's processing kernel. It is initialised in the constructor, which allows me to pass in a lambda function when creating the job. This provides an interesting alternative to using inheritance and virtual functions, and the resultant proliferation of classes. Now, I have just one Job class, and the data handles the variation.

Job.h:
        class Job;

        //! This monitors the completion of a number of jobs, and sends out a signal when they're all done
        class JobCompletionMonitor
        {
        public:
            JobCompletionMonitor(void) :
                m_numJobsPending(0)
            {
            }

            // Forbidden constructors
            JobCompletionMonitor(JobCompletionMonitor&) = delete;
            JobCompletionMonitor& operator=(JobCompletionMonitor&) = delete;

            // Jobs tell the monitor what's going on
            void notifyStart(Job* job);
            void notifyFinish(Job* job);

            // Other people can find out when they're all done
            void waitForCompletion(void);

            bool allJobsFinished(void) const;

        private:
            std::atomic_int m_numJobsPending;
            std::condition_variable m_allJobsComplete;
            std::mutex m_mutex;
        };

        //! A self-contained job that is executed somewhere in the frame
        class Job : public objects::CoreObject
        {
        public:
            // Priority system
            typedef unsigned int Priority;
            static const Priority HighPriority = 1000;
            static const Priority AboveNormalPriority = 5000;
            static const Priority NormalPriority = 10000;
            static const Priority BelowNormalPriority = 15000;
            static const Priority LowPriority = 20000;

            // Permitted constructors
            Job(const resources::NameManager::Name& name,
                const std::initializer_list<std::shared_ptr<Job>>& precursorJobs = {},
                std::function<void(float)> kernel = [](const float){},
                const Priority priority = NormalPriority);
            ~Job();

            // Forboidden constructors
            Job(Job& job) = delete;
            Job& operator=(Job& job) = delete;

            // Management of dependents
            void addDependent(const std::shared_ptr<Job>& dependent);
            void removeDependent(const std::shared_ptr<Job>& dependent);

            // Execution API
            void prepare(JobCompletionMonitor* monitor);
            bool canRun(void) const;
            void execute(const float timeStep);
            void cleanup(void);

            // Accessors
            inline const std::vector<std::shared_ptr<Job>>& getDependents(void) const
            {
                return m_dependents;
            }

            inline std::vector<std::shared_ptr<Job>> getPrecursorJobs(void) const
            {
                return m_precursors;
            }

            inline bool operator<(const Job& other) const
            {
                return m_priority < other.m_priority;
            }

        private:
            // Connection to other jobs
            std::vector<std::shared_ptr<Job>> m_precursors; //!< The job that must complete before we can run. We are a continuation of this job
            std::vector<std::shared_ptr<Job>> m_dependents; //!< A list of all the jobs that continue after us

            // Futures to sequence processing
            std::vector<std::future<int>> m_precursorFutures; //!< The future that is used to tell us when to start - if we have a precursor
            std::vector<std::promise<int>> m_dependentPromises; //!< The promises we fulfill to trigger other continuations

            // Override this function to implement your logic
            std::function<void(const float)> m_kernel; //!< A simple routine to call to implement the logic for this task

            // What is the execution priority of this job?
            Priority m_priority; //!< Execution priority of this job

            JobCompletionMonitor* m_completionMonitor = nullptr; //!< This object monitors the job. The job must notify it on completion
        };

Job.cpp:
        //! Have all of our jobs now finished?
        bool JobCompletionMonitor::allJobsFinished(void) const
        {
            return m_numJobsPending == 0;
        }

        //! The job tells the monitor when it starts
        void JobCompletionMonitor::notifyStart(Job* job)
        {
            INC_RT_ASSERT(job != nullptr);
            ++m_numJobsPending;
        }

        //! The job notifies the monitor that it is complete
        void JobCompletionMonitor::notifyFinish(Job* job)
        {
            INC_RT_ASSERT(job != nullptr);
            --m_numJobsPending;
            if(m_numJobsPending == 0)
            {
                m_allJobsComplete.notify_all();
            }
        }

        //! Wait until all the jobs managed by this monitor are complete
        void JobCompletionMonitor::waitForCompletion(void)
        {
            std::unique_lock<std::mutex> lock(m_mutex);
            m_allJobsComplete.wait(lock);
            INC_RT_ASSERT(m_numJobsPending == 0);
        }

        //! Construction
        Job::Job(const resources::NameManager::Name& name,
                 const std::initializer_list<std::shared_ptr<Job>>& precursorJobs,
                 std::function<void(float)> kernel,
                 const Priority priority) :
            objects::CoreObject(name),
            m_precursors(precursorJobs),
            m_dependents(),
            m_dependentPromises(),
            m_kernel(kernel),
            m_priority(priority)
        {
        }

        //! Destruction
        Job::~Job()
        {
            // Add some assertions here to ensure the job has been correctly deleted
        }

        // Notify this job of a new dependent that expects to be told when this has finished
        void Job::addDependent(const std::shared_ptr<Job>& dependent)
        {
            INC_RT_ASSERT(dependent != nullptr);

            // If this job isn't already a dependent, then add it
            if(std::find(m_dependents.begin(), m_dependents.end(), dependent) == m_dependents.end())
            {
                m_dependents.push_back(dependent);
            }
        }

        // Remove a dependency from this job
        void Job::removeDependent(const std::shared_ptr<Job>& dependent)
        {
            INC_RT_ASSERT(dependent != nullptr);

            // If this job isn't already a dependent, then add it, and re-issue futures
            auto it = std::find(m_dependents.begin(), m_dependents.end(), dependent);
            if(it != m_dependents.end())
            {
                // Remove this dependent
                m_dependents.erase(it);
            }
        }

        //! Reset the job ready for graph traversal and execution
        void Job::prepare(JobCompletionMonitor* monitor)
        {
            INC_RT_ASSERT(monitor != nullptr);
            m_completionMonitor = monitor;

            // Ask our precursors for a future
            for(auto& precursor : m_precursors)
            {
                precursor->m_dependentPromises.push_back(std::promise<int>());
                m_precursorFutures.push_back(precursor->m_dependentPromises.back().get_future());
            }
        }

        //! Clean up after execution. Remove all the promises and futures
        void Job::cleanup(void)
        {
            m_precursorFutures.clear();
            m_dependentPromises.clear();
        }

        // Can this job run? Have all of our futures been satisfied?
        bool Job::canRun(void) const
        {
            for(auto& it : m_precursorFutures)
            {
                if(it.valid() == false)
                {
                    return false;
                }
            }

            return true;
        }

        // Actually run this job
        void Job::execute(const float timeStep)
        {
            // Wait to receive the promised values from all precursor jobs
            for(auto& future : m_precursorFutures)
            {
                future.wait();
                (void)future.get();
            }

            // Do our actual work
            m_kernel(timeStep);

            // Fulfill promises to our children
            for(auto& promise : m_dependentPromises)
            {
                promise.set_value(1);
            }

            m_completionMonitor->notifyFinish(this);

            // Now that we are done, clear out all of the stale old futures we were waiting for -
            // prepare() will re-issue them next time around
            m_precursorFutures.clear();
        }

Job Graph

The job graph really has few responsibilities: To contain the jobs, traverse the graph and manage the resources needed for execution.

The job graphs are contained in a simple graph structure. I may later shift to a more capable container such as boost::graph

The main task at hand for the job graph is to traverse the graph, pushing all of the jobs into an ordered pool ready for execution. A thread pool then pulls jobs from this pool when they are ready for execution.

JobGraph.h:
        // A graph structure holding all the jobs and conditions
        class JobGraph
        {
        public:
            JobGraph(const size_t numThreads);
            ~JobGraph();

            void add(const std::shared_ptr<Job>& job);
            void remove(const std::shared_ptr<Job>& job);

            void execute(const float timeStep);

        private:
            // At the start of the frame, the job graph is traversed and all the jobs to be processed are placed into this pool
            JobPool m_jobPool;

            // This is a pool of threads which work on executing those jobs
            // Threads are free to steal jobs from the overall pool to keep things ticking forwards
            ThreadPool m_threadPool;

            // This is a special root-level job
            std::shared_ptr<Job> m_rootJob;

            // Need to measure when all jobs have completed
            // (not just dispatched - *completed*)
            JobCompletionMonitor m_jobCompletionMonitor;

            // Mutually exclude operations like construction, adding jobs and execution
            std::mutex m_mutex;

            // Does a (parent) job exist within the graph?
            std::unordered_map<Job*, bool> m_jobContained;
        };

JobGraph.cpp:
        // A generic minimal visitor pattern
        class JobGraphVisitor
        {
        public:
            virtual ~JobGraphVisitor() {}

            void traverse(Job* job)
            {
                INC_RT_ASSERT(job != nullptr);

                visit(job);

                for(auto child : job->getDependents())
                {
                    traverse(child.get());
                }
            };

        private:
            virtual void visit(Job* job) {}
        };

        // Visit with a view to pushing a job to the task pool
        class PushJobVisitor : public JobGraphVisitor
        {
        public:
            inline PushJobVisitor(JobPool& jobPool, const float timeStep, JobCompletionMonitor* jobCompletionMonitor) :
                m_jobPool(jobPool),
                m_timeStep(timeStep),
                m_jobCompletionMonitor(jobCompletionMonitor)
            {
            }

        private:
            JobPool& m_jobPool;
            const float m_timeStep;
            JobCompletionMonitor* const m_jobCompletionMonitor;
            std::unordered_map<Job*, bool> m_jobPushed;

            virtual void visit(Job* job) override
            {
                if(m_jobPushed[job] == false)
                {
                    m_jobCompletionMonitor->notifyStart(job);
                    m_jobPool.push(job);
                    m_jobPushed[job] = true;
                }
            }
        };

        // Reset jobs in preparation
        class ResetJobVisitor : public JobGraphVisitor
        {
        public:
            inline ResetJobVisitor(JobCompletionMonitor* jobCompletionMonitor) :
                m_jobCompletionMonitor(jobCompletionMonitor)
            {
            }

        private:
            JobCompletionMonitor* const m_jobCompletionMonitor;

            virtual void visit(Job* job) override
            {
                job->prepare(m_jobCompletionMonitor);
            }
        };

        //! Post-execution by-product clean-up
        class CleanUpJobVisitor : public JobGraphVisitor
        {
        public:
            inline CleanUpJobVisitor(void)
            {
            }

        private:
            virtual void visit(Job* job) override
            {
                job->cleanup();
            }
        };

        // Construction
        JobGraph::JobGraph(const size_t numThreads) :
            m_jobPool(),
            m_threadPool(numThreads, m_jobPool),
            m_rootJob(new Job(resources::getUniqueName("Root Job")))
        {
            INC_RT_ASSERT(numThreads != 0);
            std::unique_lock<std::mutex> lock(m_mutex);
        }

        // Destruction
        JobGraph::~JobGraph()
        {
            INC_RT_ASSERT(m_jobPool.jobAvailable() == false);
            m_jobPool.shutdown();
            m_threadPool.shutdown();
            std::this_thread::yield();
        }

        // Add a job to the graph
        void JobGraph::add(const std::shared_ptr<Job>& job)
        {
            INC_RT_ASSERT(job != nullptr);
            std::unique_lock<std::mutex> lock(m_mutex);

            if(job->getPrecursorJobs().size() == 0)
            {
                // A null precursor means that it's a root level task that lives under our
                // special root job
                m_rootJob->addDependent(job);
            }
            else
            {
                // Assert to ensure that the parent job actually exists within the graph!
                for(auto& precursor : job->getPrecursorJobs())
                {
                    INC_RT_ASSERT(m_jobContained[precursor.get()]);
                    precursor->addDependent(job);
                }
            }

            m_jobContained[job.get()] = true;
        }

        // Remove a job from the graph
        void JobGraph::remove(const std::shared_ptr<Job>& job)
        {
            INC_RT_ASSERT(job != nullptr);
            std::unique_lock<std::mutex> lock(m_mutex);

            if(job->getPrecursorJobs().size() == 0)
            {
                m_rootJob->removeDependent(job);
            }
            else
            {
                for(auto& precursor : job->getPrecursorJobs())
                {
                    precursor->removeDependent(job);
                }
            }

            m_jobContained[job.get()] = false;
        }

        // Execute all jobs in the graph
        void JobGraph::execute(const float timeStep)
        {
            std::unique_lock<std::mutex> lock(m_mutex);

            // Make sure the job queue is fully empty from last time
            INC_RT_ASSERT(m_jobPool.jobAvailable() == false);

            // Get it all reset ready for execution
            {
                ResetJobVisitor resetJobVisitor(&m_jobCompletionMonitor);
                resetJobVisitor.traverse(m_rootJob.get());
            }

            // Traverse the whole job graph, pushing the jobs into the task pool
            {
                PushJobVisitor pushJobVisitor(m_jobPool, timeStep, &m_jobCompletionMonitor);
                pushJobVisitor.traverse(m_rootJob.get());
            }

            // (Note that tasks will be consumed asynchronously as soon as we start to push them)

            m_jobCompletionMonitor.waitForCompletion();
            
            // We should be all done now. Nothing should be here

            INC_RT_ASSERT(m_jobPool.jobAvailable() == false);

            // Get it all cleaned up ready for next time
            {
                CleanUpJobVisitor cleanUpVisitor;
                cleanUpVisitor.traverse(m_rootJob.get());
            }
        }

Job Pool

It is initially tempting to simply traverse the graph, push all of the jobs out for execution, and let the OS arbitrate everything. There are a few difficulties here. First of all, unless your target platform's runtime explicitly models tasks or jobs, this tends to lead to mass thread creation and destruction - a substantial overhead. Secondly, you may wish to cancel the execution of jobs after a precursor has completed. Thirdly, it makes little sense to dispatch many jobs for execution that have no chance of execution. They will just immediately hit a threading hazard and context-switch away. Finally, consider the case where you have N cores and N+1 jobs. Job 0 does some work, and jobs 1..N wait on job 0. Imagine if your target platform has a pooly-implemented runtime, and assigns jobs 1..N to cores instead of job 0. Those jobs may wait, and job 0 may not execute. This case is unlikely in a well-implemented runtime, but it is a risk, and why present code that is inherently at risk?

The job pool is simply an std::set of jobs, ordered by job priority. Whenever a thread in the thread pool wakes up, it pops the first (highest priority) job from the pool and executes it.

JobPool.h:
        // This represents a pool of tasks that need doing, for all threads to
        // pull from (and therefore work-steal from)
        class JobPool
        {
        public:
            JobPool(void);
            JobPool(JobPool&) = delete;
            JobPool& operator=(JobPool&) = delete;

            bool jobAvailable(void);
            void push(Job* job);
            Job* pop(void);

            void waitForJob(void);
            bool shouldTerminate(void);
            void shutdown(void);

        private:
            //! This gets notified when a task is pushed, to allow the other consuming
            //! threads to wake up
            std::condition_variable m_jobPushNotification;
            std::condition_variable m_queueEmptyNotification;

            //! We maintain a set of jobs, so that we can find the first, highest-priority job
            //! that is in an executable state
            std::set<Job*> m_jobs;

            //! This mutex is used to ensure thread safety when pushing a job in or pulling it out
            std::mutex m_mutex;

            //! Used when it's time to shut down
            std::atomic<bool> m_shouldTerminate;
        };

JobPool.cpp:
        // Construct it
        JobPool::JobPool() :
            m_shouldTerminate(false)
        {
        }

        // A consumer calls this function to wait for notification via the condition_variable of a new task
        void JobPool::waitForJob(void)
        {
            std::unique_lock<std::mutex> lock(m_mutex);
            m_jobPushNotification.wait_for(lock, std::chrono::milliseconds(100));

            // The above timeout means that spurious wakeup is possible
        }

        // Is there a job available in the set?
        bool JobPool::jobAvailable(void)
        {
            std::lock_guard<std::mutex> lock(m_mutex);
            return m_jobs.empty() == false;
        }

        // Push a new job into the pool
        void JobPool::push(Job* job)
        {
            INC_RT_ASSERT(job != nullptr);
            std::lock_guard<std::mutex> lock(m_mutex);

            INC_RT_ASSERT(m_shouldTerminate == false);
            m_jobs.insert(job);
            m_jobPushNotification.notify_one();
        }

        // Try to return an executable job from the pool
        Job* JobPool::pop(void)
        {
            std::lock_guard<std::mutex> lock(m_mutex);
            if(m_jobs.empty())
            {
                // Nothing to run!
                return nullptr;
            }
            else
            {
                // Find the highest priority job that can run
                for(auto job : m_jobs)
                {
                    if(job->canRun())
                    {
                        m_jobs.erase(job);
                        return job;
                    }
                }
            }

            // It is possible that we're asked for a job but none are able to run... yet
            return nullptr;
        }

        //! Should we terminate now?
        bool JobPool::shouldTerminate(void)
        {
            return m_shouldTerminate;
        }

        //! Send termination message
        void JobPool::shutdown(void)
        {
            std::lock_guard<std::mutex> lock(m_mutex);
            m_shouldTerminate = true;
            m_jobs.clear();
            m_jobPushNotification.notify_all();
        }

Thread Pool

The purpose of the thread pool is to provide a persistent set of threads capable of executing jobs, saving on the cost of thread creation and destruction. These threads pull jobs from the job pool until they are asked to terminate.

Spurious Wakeup

The threads in the thread pool wait on an std::condition_variable to notify them on the availability of a job. It is important to consider the possibility of spurious wakeup in this system. Therefore, when a thread wakes up, it must ensure a job actually exists in the pool to be executed.

I actually deliberately allow the std::condition_variable::wait() to time out after 100 ms. The rationale for this is that some operating systems become suspicious of threads that sleep for long periods of time. For stability reasons, I prefer my system to wake up unneccesarily and to service the outer thread loop than wait for infinite amounts of time. This therefore deliberately introduces some degree of spurious wakeup into the system.

ThreadPool.h:
        // A pool of threads hungry to execute tasks
        class ThreadPool
        {
        public:
            ThreadPool(const size_t numThreads, JobPool& jobPool);
            ~ThreadPool();

            void shutdown(void);

        private:
            //! The set of threads in the thread pool
            std::vector<std::thread> m_threads;
        };

ThreadPool.cpp:
        // Construction / destruction
        ThreadPool::ThreadPool(const size_t numThreads, JobPool& jobPool)
        {
            for(size_t thread = 0; thread < numThreads; ++thread)
            {
                m_threads.push_back(std::thread([](JobPool* jobPool)
                    {
                        for(;;)
                        {
                            // Sleep, until a job is available
                            // (use a condition_variable here)
                            jobPool->waitForJob();

                            // Have we been told to shut down?
                            if(jobPool->shouldTerminate())
                            {
                                return;
                            }

                            // Pull a job out of the task pool and execute it
                            if(jobPool->jobAvailable())
                            {
                                // Whilst we'd prefer to execute task() here, instead we have to do
                                // the pop-and-execute as an atomic-like operation to avoid
                                // two threads popping the same task, or one thread popping
                                // a task that the other thread expected to receive
                                Job* job = jobPool->pop();
                                if(job != nullptr)
                                {
                                    // Clearly the correct time step should be plumbed here
                                    job->execute(1.0f / 30.0f);
                                }
                            }
                        }
                    }, &jobPool));
            }
        }

        ThreadPool::~ThreadPool()
        {
        }

        // Shutdown the threads. You must have told the JobPool to terminate
        void ThreadPool::shutdown(void)
        {
            for(auto& thread : m_threads)
            {
                thread.join();
            }
        }

Findings

The new C++ 11 libraries largely made this system easy and quick to implement. The libraries are quite clearly defined, unobtrusive, easy to use with few dependencies.

One of the main difficulties encountered in the implementation of this system is simply the life cycles of the C++ 11 std::promises and std::futures. Their life cycles are not clearly documented and they can be difficult to reason about. It was only by reading the STL code that I found these primitives fundamentally hold a "use and discard" policy. This leads to the need for some code to create and destroy these resources each frame. Note that this may possibly vary across platforms, but this seems the best least-common-denominator assumption. This is largely the reason I avoided the use of std::packaged_task. With things like move semantics transferring the object between threads and queues, it becomes very difficult to reason about the life cycle of the object! Furthermore, once you have your own job pool and thread pool, it largely obviates the need for this class. This class only seems useful for "go and load some assets on a thread" type work.

The system itself has thus far proven stable. I have unit tests for a variety of graph configurations, which all pass, and it successfully manages to run my current engine. This engine is in its infancy, and so has not massively load tested this system... but, ultimately, this is meant to be an illustrative, exploratory post!
 

Thursday, 17 May 2012

The Coder-Trap of Hypotheses

As my project is getting towards the end, again, I'm doing a fair bit of debugging. Every so often, a horrible bug crops up that takes a few days, a week, two weeks to fix. Every time I have one, I try to think, on reflection, what could I have done to fix it quicker?

I've noticed a pattern emerging.

Briefly, bugs take me longer to fix when I allow myself to be seduced into chasing half-formed hypotheses as to what the bug is. These hypotheses often prove ill-founded, leading to wasted time that doesn't often add real data or evidence to your bug search. Really, those intermediate hypotheses are little more than speculation or conjecture.

I'm becoming increasingly aware that the more effective strategy is not to consider hypotheses until as late as possible; instead, concentrate on gathering data, evidence and devising test fixtures. This is a very difficult discipline as people naturally love to formulate hypotheses to explain things. They are seductive. They feel like you've found a short-cut to the solution, you've caught your quarry, the tension of the bug fix is over. You get to announce your triumph.

But it's rare that you fix the most difficult bugs with a single brilliant moment of insight. More often, it takes hours of analysis and evidence gathering. That's the most reliable way to reach a well-founded hypothesis that completely fixes the bug, rather than a partial fix, a temporary fix or a heisenbug fix.

Thursday, 12 April 2012

Interruption Handling Techniques for Programmers

Once more, I find myself in the post-alpha run to the finish line. And once again, I find myself getting busier and busier as the completion date looms.

One of the most challenging issues at this time is managing the stream of traffic to your desk. Other people are very busy too. People have questions. People need answers. And answering those questions takes time away from handling your own workload, allowing it to multiply like bacteria. I find programmers tend to employ several archetypal techniques to handle those interruptions:

1. Just help them. Indulge them. Offer them as much help as you can. Get their problem fixed, as it helps the team move forward. Really, this is the ideal strategy for the whole project. Someone has come to you with a problem, they may be at their wits end, you should help them fix it.

The problem is that it's a win-win-lose proposition. It's a win for the individual, it's a win for the team, but it's a lose for you.

2. Say you don't know. Strangely, this is a rare tactic. People often feel it's preferable to have a stab, and speculate as to the cause or the solution, even without any useful knowledge or data. Email is rife with this kind of thing. This isn't actually useful. The individual with the problem is likely to have conducted much the same process and reached similar conclusions. It's nice that you're trying, but this just consumes time and doesn't advance a solution. Better to say you don't know, and refer them to someone else who may be able to help.

3. Pretend you don't know. You may know the answer, but since fully helping is a time sink for you, you may judge it more profitable to deny all knowledge and continue work.

Until someone rats you out.

Not a great plan. Not professional. Instead, it's better to admit knowledge where you have it, but if you have a more pressing matter, explain yourself and offer help when you're done. If it's urgent, they can find help elsewhere, if not they can wait. Chances are they'll solve it anyway in the meantime, increasing their own knowledge and learning.

4. Measured help. Give them just enough help to get them moving forward again, but stop short of full disclosure. The advantages to this technique are that you're offering help, the individual is employing their own initiative and learning and everyone gets to move forward. The downside is that they may return to your desk a further 38 times before the problem is solved.

5. Be brusque. Make people not want to ask. This is a win for you, in the short term, but a lose for the team and a lose for yourself in the long term as people will not offer help in return. It can be difficult to resist this strategy when you're stressed!

6. Refer them to a subordinate. The joys of seniority! Pass the problem down the line to the next guy. It's a win for you, but it has mixed results. It may result in a win for the delegate, as they learn something and increase their standing within the team. It may be a win for the individual. It's a win for you, as it frees your time up. The problem is if the delegate cannot solve that problem. The delegate burns a lot of time, the individual with the problem loses time, and ultimately you end up having to help out anyway. Choose carefully.

7. Dazzle them with science. This is a strange one. When faced with a non-technical user, they blaze them with incredible technical knowledge and prowess. A scintillating analysis, definition and solution to the problem is offered; the individual looks back, blankly. No solution is made. The individual retreats, and contacts your line manager to request that you implement the solution instead. You clearly know what you're talking about.

8. Debate them on it. One for the orators. Any problem can be negated with a gallant argument, excellent critical analysis and undeniable logic. Your inquirer retreats, defeated. No more problem you think! No. Instead, the inquirer learns to cut you out of the loop, and ratify and implement a solution elsewhere, elsehow.

9. Blame the build server / Visual Studio / Perforce / Whatever. Clearly the programmer doesn't honestly believe this is the case. It's just a play to buy some time, or to momentarily offload the stress.

10. Counter-problem! You come to me with a problem? You're gonna leave with a bigger one! Thats the Chicago way.

Saturday, 11 February 2012

R & D - It keeps you honest!

I've recently been working in an R&D group, developing various features that are quite experimental and wouldn't normally be scheduled as part of a project. This is intended to be "practical" R&D - we aim to develop features that have a chance of making it into the game, as opposed to features that are blue sky and would never actually be shipped.

This kind of development approach subtly shifts your approach to developing features. Briefly, you become very focused on doing what is necessary to get the feature into the game. You focus on making sure the feature is on-budget. You make sure that you consult with affected parties. You work out the consequences of incorporating the system and seek to mitigate any problems. You suddenly find yourself becoming a salesman, trying to establish a coalition of support for the feature. Your feature really has to prove itself worthwhile so that management will entertain the perceived risk of incorporating it.

Strangely, you don't always find yourself doing these things during the development of "standard" features. Because they are planned as part of the development effort, their inclusion in the title is automatically assured - pending no major disasters. You don't have to sell them. These features don't need to earn their place. Consequently, it can be very easy to find yourself "going down a rabbit hole" with features like this - exploring functionality that is interesting, but not necessarily focused on shipping. Perhaps these features are ultimately more risk-intensive, as they won't be quite so critically examined as the R&D features.

I tend to find that the whole "posture" you adopt during the development of an R&D feature that is not automatically included in the game is ultimately much better from the perspective of careful software development. It does at least feel more like the kind of process intended by the creators of various software development methodologies. It feels very much along the lines of the approach advocated by Michael Saladino in his "Why Red Lining Your Team Leads to Ineffectiveness".

Perhaps it's not for everyone. Perhaps it's not for every team. Perhaps some may find it quite a pressured, stressful kind of approach. But I really quite like it. It certainly feels a much more considered, deliberate approach. It seems that developing features in branches, that have to prove their worth for integration, sets in motion a slightly changed, more focused set of processes than may be typical.


Thursday, 9 February 2012

Sparse Voxel Octrees in Haskell

It's been a long time coming (daughters birth impedes Haskell coding!), but I've finally implemented and uploaded support for Sparse Voxel Octrees in my Haskell Renderer.
It's quite an interesting algorithm to implement in something like Haskell. It's interesting because it really questions the balance of strict vs lazy evaluation in Haskell. Which is better - strict construction and strict traversal, with the platform optimisations that may bring, or lazy construction-as-you-traverse?
In an imperative language, you'd run a complete process to construct the tree, and then traverse the completed tree for each intersection query. You have to pay the total construction cost plus the traversal cost. Since you'd be constructing and storing the whole tree, you need to invest significant effort in data compression. On the other hand, the direct data representation would permit you to make a number of optimisations aimed at evaluation efficiency.
With Haskell, the lazy evaluation thunks can do some of that work for you. Instead of trying to crunch the whole tree down to be as small as possible, you may choose instead to rely on the Haskell runtime to just-in-time construct the relevant portions of it. You would only pay the cost of construction for the parts of the tree you traverse. Would this sum be lower than a strict construction and traversal?

Alternatively, In Haskell, you can still go for the strict option and implement it much as you would with C.

What's interesting is that it introduces this challenging, general question of what offers the most efficiency: A domain-specific, efficiently implemented clever algorithm, or a somewhat more generic algorithm, less specifically optimised but instead relying on lazy evaluation and the higher-order optimisations that may automatically permit?

I'm not sure what the answer is to this question right now. I suspect there is a cross over point that is very specific to the data set in question and therefore your particular application.

Code

My current Haskell SVO code is on github, in my Haskell Renderer:

I'm going to fix up the various remaining TODOs and then pursue a more optimised, C-like version along the lines of the various SIMD-like SVO implementations that have recently emerged.
Implementation Notes

This was ultimately quite an elegant algorithm to implement in Haskell. Much of the algorithm is built off simple geometric tests. These are quite compact and elegant to represent in a functional language like Haskell. For instance, a ray-box test is simply:

boundingBoxIntersectRay :: AABB -> Ray -> Maybe (Double, Double)
boundingBoxIntersectRay (bounds0, bounds1) (Ray rayOrg _ invRayDir rayLen)
     | tmax > tmin = Nothing
     | tmin > 0 && tmin < rayLen = Just (tmin, tmax `Prelude.min` rayLen)
     | tmax > 0 && tmax < rayLen = Just (tmax, tmax)
     | otherwise = Nothing
where
     (tmin, tmax) = foldr1 (\(a0, b0) (a1, b1) -> (a0 `Prelude.max` a1, b0 `Prelude.min` b1)) (map intercepts [vecX, vecY, vecZ])


intercepts f = let x0 = (f bounds0 - f rayOrg) * f invRayDir
                   x1 = (f bounds1 - f rayOrg) * f invRayDir
               in (x0 `Prelude.min` x1, x0 `Prelude.max` x1)

Here we simply define an operation to calculate the slab intercepts of the box, and then repeat that operation over X, Y and Z, then fold to reduce the results. We're offering a description of the algorithm to Haskell in high level primitives, permitting optimisation and reduction - as opposed to clumsily trying to indirectly tell a C++ compiler what you'd really like it to do. It's surprising how often this sort of thing proves faster in Haskell. For example, using the "any" or "all" functions has proven faster than a set of || or && clauses for me in the past. I could really imagine that a thunk-less version of Haskell, free of the irritation of considering C++ aliasing might produce some very efficient code.

The traversal code is a simple recursive traversal, with few (as yet) clever tricks. It terminates either at the leaves, at the maximum traversal depth or when the feature becomes too small when projected to screen space (though this calculation needs further elaboration). It therefore gives a simple, automatic LOD control.
And so, finally, here it is, our SVO lego-sphere:

Thursday, 2 February 2012

Peak Abstraction


An interesting phenomena I've observed over my years as a programmer is the occurrence of "Peak Abstraction" within a young programmer - and their subsequent improvement.

There is a common pattern. It typically occurs after 3-4 years of professional practice. A programmer is growing in his confidence. He begins to tackle more complex, challenging problems. And, he reflects those problems with complex, challenging data structures. He has not yet learnt the fine art of simplicity.

Every member of every data structure is a pointer or a reference. There are no simple data types. Everything becomes a "value dictionary", a "graph-based visitor" or a "multi-dimensional ring buffer". Disassembly of his code reveals it to be little more than an infinite sequence of lwz, lwz, lwz, lwz and more lwz. Perhaps he/she develops a hobbyist interest in Java. They may develop an editor that connects boxes with lines - such editors are great for filling with lots and lots of abstraction! Often, she/he may favour Boost; perhaps he even becomes your office's Boost Guy. Even simple plain old data types such as integers are suddenly held within a smart-pointer container and reference counted. And with a pimpl to retrieve their value.

Those around them become increasingly desperate. They can see the code becoming inflated, inefficient, unmaintainable. Spending time reading this individual's code is like spending time exposed to a radioactive isotope. You can only tolerate a few moments before exhibiting signs of sickness. You attempt to modify their code. 478 pages of template errors dissuade you from considering this foolish course of action again.

And then it happens. The programmer realises the error of their ways. They realise that they are adding neither design value nor computational value. They realise they are adding overhead and maintenance trouble. Perhaps they undergo their first encounter with profiling software. Said software may disclose the full terror of the D$ dirty bomb they have unleashed upon the code. Perhaps they are "introduced" to the profiler by an irate lead in some sort of shotgun-wedding affair. Whatever, the programmer realises that solving complex problems does not require needlessly complex data structures. Such structures impress nobody. People are impressed by simple, elegant solutions. And thus the recovery begins. Data structures become simple, expressive, and maintainable. Data cache misses dip under 20ms/frame for the first time. Co-workers enjoy modifying their code. The compiler begins to emit instructions such as "add" or "mul". Sadly the compiler may be too fond of outputting "jump" instructions at this stage... The programmer themself becomes wary of others who are pre-Peak Abstraction.

One even wonders if, like it's namesake "Peak Oil", there may be a global Peak Abstraction. Perhaps the practising body of programmers may all undergo Peak Abstraction in the next 5-10 years and we will all left never having to refactor some template monstrosity.

But then, where's the fun in that? Don't programmers just love to be indignant about others' code?

Tuesday, 13 September 2011

Haskell Photon Mapper Released to Hackage

I've now uploaded my Haskell photon mapper and raytracer to hackage. It is listed as a package called "crocodile" (named after my son's current obsession with crocodiles!).

It's all working on the Cornell box. It's pretty memory hungry, and has a few self-declared things to fix (eg irradiance gradients), but it's out there to play with.

Enjoy.