The Hummingbird HTML rendering engine’s architecture is based on tasks (also known as jobs). Units of work are dispatched to a task scheduler who runs them on different threads to maximize the resource usage of the hardware.
Hummingbird’s task system is very versatile. It allows developers to schedule tasks whose body is a C++ lambda and put them in different queues for processing. We decided from the inception of the system to try maximize the freedom of the developer when creating and using tasks. This freedom however can be dangerous when the lifetime and eventual data races between objects have to be taken into account.
Tasks conceptually (although the same in code) can be divided in two groups:
- “Simple” data-oriented tasks that execute some data transformation on an independent work group and have no side effects.
- High-level tasks that might arise from the interaction of multiple systems and objects with possibly complex lifetimes.
I definitely try to have as much work possible in the first type of tasks. A good example of such a task is the layout of a group of elements. The input data are POD structs with the styles needed for the element layout and the output are the positions of each element on the page.
High level tasks require an interaction with the lifetime of objects, possibly some of them are reference counted and usually can be accessed only in certain threads.
A particularly nasty error that arises from using reference counted objects is having them be destroyed in unexpected moments or threads.
In Hummingbird we try to encourage “simple” tasks and to make higher level ones tough to accidentally misuse. We introduced a compile-time checking system for task parameters. Classes can be marked to allow/disallow using their instances as task parameters. There are four ways for an object to be passed as a parameter in a task:
- Passing object by value. This is always OK in our system. The object passed will be a copy private to the task so it shouldn’t involve changing global or shared state. The object can still contain pointers as members or modify global state in a method but this is better caught in design or code reviews and has never caused errors in our code so far.
- Passing object by pointer. This is generally NOT OK. The system will not allow by default passing pointers unless they are explicitly marked by the programmer. Passing naked pointers to tasks is the source of most errors as the lifetime of said object is probably outside the task and there is a chance that object will be accessed concurrently. There is also the issue with the lifetime of the object, which is not well defined.
- Passing weak pointers. In the beginning we implicitly allowed this usage but recently made them require explicit marking as well.
Explicitly marking which classes are OK to be passed between tasks has several benefits:
- Forces the programmer to think twice about the lifetime and concurrent use of data.
- Helps in code reviews by signaling to reviewers that they should carefully inspect the usage of certain objects.
- Implies a contract for how the data will be used and self-documents the code.
- Avoids inadvertently passing wrong objects to tasks, which can happen due to the easy syntax provided by lambdas.
We have also added a mechanism to disallow passing a class to tasks altogether, even by value.
The implementation of the system is based on templates and some macros to make the syntax easy.
Creating a task is done in the following way:
The key here is the TASK_PARAMS, which validates each parameter. In the next blog post I’ll go into details on how the task validation mechanism is implemented.