Robust C++ Task Systems Through Compile-time Checks

At this year’s devcom 2017 I gave a talk on our C++ task-system in Hummingbird, here it is:

Advertisements

Compile-time check on task parameters (part 2)

In my previous post in the series I wrote how in our HTML rendering engine we try to avoid accidental errors when passing parameters to tasks. A task is a unit of work that might happen on a different thread. Specifically we want to avoid passing objects of types that are not designed to be kept in a task. The previous post highlights the rationale behind the design of the system, I’ll only discuss the implementation here.

The syntax for launching the task is the following:

The syntax is very close to a normal C++ lambda, slightly modified to perform the parameter validation, which is hidden behind the TASK_PARAMS macro. At compile-time it’ll walk all the variables passed to it and make sure that their types are explicitly allowed to be passed to a task. All other types will produce a meaningful compilation error.

The macro is expanded to a ValidateTaskParameters function that looks like this:

The function inspects all parameters through variadic template expansion and performs a series of compile-time checks of their types. The meta-programming templates are somewhat annoying, but worth the complexity in this case.

The first condition for instance says “if the type is a pointer and it wasn’t explicitly allowed to be passed as pointer – generate error”. We usually don’t allow naked pointers to be passed around in tasks, but if the developer knows what she is doing, she can force-allow it.

Marking types is done with template tagging hidden behind a macro for convenience.

Additional macros are available for marking other ways to share an object: by shared pointer, by weak pointer etc. There is also tag & macro that forbid passing a type to a task altogether.

To recap, our system forces developers to think carefully what types can be passed around in tasks and reduces the chances of accidental errors. The implementation warns at the earliest possible time – during compilation and has no run-time cost and a modest compilation-time cost.

Suggestions and comments are appreciated, please share if you have tackled a similar problem.

Web Tools for Application UI

Yesterday at GDC 17, Andreas Fredriksson from Insomniac gave a fascinating talk about their experience using web tools (HTML, JS) to create AAA game editing tools. You can see the slides here: https://deplinenoise.wordpress.com/2017/03/03/slides-insomniacs-web-tools-postmortem/.

I believe that using web tools for application UI development is a good solution. At Coherent Labs, along with our main middlewares, we develop the Coherent Editor, which is a game UI development tool written with web tech, like Insomniac’s.

I want to give a sum-up of our experience developing the tool and how it compares to Insomniacs’. The Editor is still in heavy development and we are happy how it’s shaping up and the development process we built around it.

The Coherent Editor in version 1.5.3.4 looks like this:

coherent_editor

The final output is a game UI in standard HTML/CSS that can be consumed in Coherent GT or Hummingbird. This particular sample will look like this in the game:

data_binding

The Coherent Editor is currently ~100K lines of JavaScript, it’s smaller than Insomniacs’ tools, but still a very large application by standard-JS-app sizes.

JavaScript gives us the same benefits that Andreas has highlighted + some more:

  • Super-quick iteration
  • We can leverage JS libraries
  • Exercises our own runtime – Coherent GT
  • Can output optimal HTML/CSS for our runtime
  • Directly interfaces with our products
  • Can have game-specific features

Finding great JavaScript developers seems easier than C++ devs. We leverage all C++ devs only for the development our runtimes.

Andreas highlights many issues they encountered while using the web stack in their development process, I’ll sum-up how we solve them in our team:

  • Don’t use Chrome/CEF/Firefox/Edge etc.

The Coherent Editor runs on-top of Coherent GT. Here we have the advantage that we create our own technology. 5 years ago when we started Coherent Labs, we based our first generation product on chromium (the project Chrome builds upon). We quickly realised this is not so good. Chrome is an amazing beast but does too much things, changes too often and breaks things all the time.

With our own technology we keep features stable and concentrate on what makes sense for games. Generally GT is an order of magnitude faster rendering-wise compared to Chrome. Chrome has a super complicated out-of-process architecture that is useless for games and hogs memory like no other.

Andreas notes that in the end they had to freeze the Chrome version they used and that a stable custom runtime would have been better. When we started the Editor there were debates if it should be based on Chrome/Firefox, but fortunately we didn’t go that road.

  • Extend communication JS <-> C++

We have a very easy layer of data binding in GT that allows communication and data-exchange between JavaScript and C++. All the OS-specific things is directly implemented in the Editor. An API is exported from C++ to JS and it can do everything it needs – loading/saving files, launching external tools, importing from Photoshop/Illustrator and so on. This removes a huge amount of complexity, when you need something written in native, you simply go, write it and use it from JS.

Performance-heavy tasks can also go to other threads in C++, we are not limited to the threads JS allows.

  • Test all the things

We also quickly realized that a lot of JS code becomes quickly very brittle. Fortunately JS is also easy to test. We added Selenium support to Coherent GT and now the Editor team writes test for all their features, which helped a lot for the stability and confidence in the tool.

  • Use declarative data binding and components

The best code is the one that does not exist. GT has a declarative data-binding layer that allows UI developers to NOT write any JS code and attach fields, properties etc. directly to the C++ data model. It can also instantiate UI elements (components) in a “for”-cycle way.

Think an “Open file” dialog, where the “model” is the list of files read from C++ and the dialog is directly populated with the “File” components in the UI. The “File” component is a small piece of HTML with and icon, text, properties etc. This approach saves a ton of time and reduces the risk of errors.

  • Typescript

The Coherent Editor is written in Typescript, which is also what Insomniac ended up doing. Specialized scripts run autonomous and recompile and redeploy the changed JS. It happens pretty quickly at least in our application and doesn’t seem to impact iteration time significantly. I guess it depends on the overall project structure though.

  • GC configuration

We have configs on the JS GC that make it run less often during heavy tasks and allows it to do it’s work when the Editor is relatively idle. It increases the peak memory usage, but eliminates hiccups.

Conclusions

Using web-only tech with a standard browser is very, very tough to get right for a large Application UI shell. What saved us is that we have our own technology that solves the downsides of a “pure” browser-based solution.

  • Performance was not a problem – GT is a game UI runtime designed for consoles so on development machines, the Coherent Editor UI flies.
  • Data-binding really makes a difference. We designed it to make it easy for games to communicate with the UI and in essence NOT write any JavaScript. Separating the UI development from the “backend” speeds up development x10.
  • If something doesn’t work as we like – we can change it. Although Chrome is OSS, making changes there is incredibly time-consuming and risky (they break it the day after). Believe me.. I’ve gone that path.

We still have a lot of work to do on the Coherent Editor. There are operations that can put it to it’s knees (try loading 4000 images and go grab a coffee) but they are mostly related to handling specific cases than to the overall architecture. We haven’t yet achieved the complexity of Insomniacs’ tools and I hope it’ll hold when we do.

There’s a lot of merit in using HTML/JS/CSS for large applications. I know of other companies that have used web tech for their tools, if you have a story to share, please do.

Compile-time check on task parameters (part 1)

The Hummingbird HTML rendering engine’s architecture is based on tasks (also known as jobs). Units of work are dispatched to a task scheduler who runs them on different threads to maximize the resource usage of the hardware.

Multi-threaded programming however is notoriously difficult because of the risk of race conditions. An awesome overview of how to utilize the hardware is given in the following blog series.

Hummingbird’s task system is very versatile. It allows developers to schedule tasks whose body is a C++ lambda and put them in different queues for processing. We decided from the inception of the system to try maximize the freedom of the developer when creating and using tasks. This freedom however can be dangerous when the lifetime and eventual data races between objects have to be taken into account.

Tasks conceptually (although the same in code) can be divided in two groups:

  • “Simple” data-oriented tasks that execute some data transformation on an independent work group and have no side effects.
  • High-level tasks that might arise from the interaction of multiple systems and objects with possibly complex lifetimes.

I definitely try to have as much work possible in the first type of tasks. A good example of such a task is the layout of a group of elements. The input data are POD structs with the styles needed for the element layout and the output are the positions of each element on the page.

High level tasks require an interaction with the lifetime of objects, possibly some of them are reference counted and usually can be accessed only in certain threads.

A particularly nasty error that arises from using reference counted objects is having them be destroyed in unexpected moments or threads.

In Hummingbird we try to encourage “simple” tasks and to make higher level ones tough to accidentally misuse. We introduced a compile-time checking system for task parameters. Classes can be marked to allow/disallow using their instances as task parameters. There are four ways for an object to be passed as a parameter in a task:

  • Passing object by value. This is always OK in our system. The object passed will be a copy private to the task so it shouldn’t involve changing global or shared state. The object can still contain pointers as members or modify global state in a method but this is better caught in design or code reviews and has never caused errors in our code so far.
  • Passing object by pointer. This is generally NOT OK. The system will not allow by default passing pointers unless they are explicitly marked by the programmer. Passing naked pointers to tasks is the source of most errors as the lifetime of said object is probably outside the task and there is a chance that object will be accessed concurrently. There is also the issue with the lifetime of the object, which is not well defined.
  • Passing by shared pointer. DOM objects often have shared ownership due to the design of the standard DOM and the interactions with JavaScript. To pass a shared pointer to a certain type, the developer has to explicitly mark it as OK.
  • Passing weak pointers. In the beginning we implicitly allowed this usage but recently made them require explicit marking as well.

Explicitly marking which classes are OK to be passed between tasks has several benefits:

  • Forces the programmer to think twice about the lifetime and concurrent use of data.
  • Helps in code reviews by signaling to reviewers that they should carefully inspect the usage of certain objects.
  • Implies a contract for how the data will be used and self-documents the code.
  • Avoids inadvertently passing wrong objects to tasks, which can happen due to the easy syntax provided by lambdas.

We have also added a mechanism to disallow passing a class to tasks altogether, even by value.

The implementation of the system is based on templates and some macros to make the syntax easy.

Creating a task is done in the following way:

The key here is the TASK_PARAMS, which validates each parameter. In the next blog post I’ll go into details on how the task validation mechanism is implemented.

Static variables in functions bite

The other day I was investigating the performance profile of a stress test in our Coherent Labs Renoir graphics library. Renoir collects high-level rendering commands like “DrawText”, “DrawPath” etc, and transforms them in low-level API commands for the GPU.

What caught my eyes was that the function BlendingMode2State was taking ~ 1.5% of the time. The function is called for each drawing command (hundreds of times per-frame), but still this looked disproportional. The function is declared with the following signature:

inline BlendingState BlendingMode2State(BlendModes mode)

BlendModes is an enum that contains pre-defined blending modes and BlendingState is a simple structure that contains the required GPU operations to implement that blend mode (SrcBlend, DestBlend etc. for more information on graphics alpha blending, take a look here)

Looking at the implementation made things clear:

The implementation immediately rang a bell. The static variable holding the mapping of the blend modes is declared at function level. According to the lifetime rules of C++, it’ll be initialized the first time the function is called. This bring the side effect the in each cal there is branch that checks if the variable has been initialized. Not good.

Checking the disassembly revealed another inefficiency. The initialization assembly was 2 screens long at the beginning of the function – there are a lot of BlendModes. The assembly did a conditional jump if the variable is initialized and skipped hundreds of bytes worth of initialization code to go to the gist of the function, which is just returning the correct entry in the array.

The third problem was that the initialization code hindered the inlining of the function. It was so large that it made sense that the compiler ignored the “inline” request.

To recap, there were actually 3 linked issues with the function:

  • static variable at local scope requires a branch on each call
  • massive initialization code causes potential instruction cache trashing
  • inlining is impossible due to the initialization code withing the function

The fix was trivial:

I moved the variable outside the function. This allowed for proper function inlining and gave an overall 1% performance increase of the library in my test. Not bad for 10 minutes work.

In your work avoid static variables at function level – they are bad practice anyway, mostly used for singleton objects whose lifetime has to be lazy. If you have such an object that is accessed very often, you might have a similar performance hit.

Type annotation in C++

In systems like game engines and our HTML renderer Hummingbird, developers have to work with objects transformed in different coordinate systems. Using one generic type can lead to confusion on what object is required in a particular situation. Errors are often subtle and hard to track. I tried to mitigate this by using stringent static typing in our software. New types are created by annotating them with metadata.

In the HTML visual model, almost everything is a box (a rectangle), and it’s size and position is calculated by the layout engine. The boxes however have to go through additional transformations until they reach their final positions on-screen – they can be scrolled, 2D/3D transformed by CSS properties etc. This is similar to a game engine where objects also undergo different transformations between coordinate systems – model, view, world, shadow and so on. Some calculations in the code require boxes in the Layout coordinate system, others need them scrolled, others need them transformed.

We used a Rectangle C++ struct everywhere in Hummingbird to represent a Box. It became a common error to pass the wrong Rectangle to an operation. For instance a certain calculation needs a Layout box, but the programmer passes by error a scrolled one. A more explicit system was required.
If we look at a typical C++ function signature in the style:

bool HitTest(float2 coords, const Rectangle& box);

It is unclear what the exact type of the box has to be – in our software the programmer could pass a scrolled box, while a layout one is required. Testing this is also tricky, because if the box has no scroll (it is 0,0) then it’ll work most of the time but break upon scrolling.

The issue was solved by “documenting” the requirement:

// Requires a layout box
bool HitTest(float2 coords, const Rectangle& layoutBox);

Unfortunately this is still error-prone as relies only on the programmer and code reviewer attention.

It’s better to have the C++ static type system help in the situation:

bool HitTest(float2 coords, const LayoutRectangle& box);

Now this is much more clear and will avoid accidental errors.

The idea can be extended to cover also the transformations themselves that lead to a certain box. The transformation (a matrix) is also strongly typed and encodes compile-time the coordinate system it work in. Instead of having a generic Matrix class we have a ScrollMatrix, CSSTransformMatrix etc as types. The product of a box transformation is defined by the input box and the matrix applied.

LayoutRectangle lb = LayoutElement(element);
ScrollMatrix scroll = GetScroll(element);
ScrollRectangle sr = lb.Transform(scroll);

The system will automatically generate the correct type from the input parameter and the typed transform.

Implementation

The system has the following requirements:

  • No runtime overhead
  • Types are defined by the transforms they encode
    • DisplayRectangle is Rectangle with {Layout, Scroll and CSSTransform}
    • ScrollRectangle is Rectangle with {Layout and Scroll}
  • Maximum static checking on types and improve code readability
  • Typed transforms must generate correct new types
  • Types that are a subset of another are allowed to be assigned on them. For instance:
    • DisplayRectangle ds = ScrollRectangle sr(..);

The final requirement may seem like a defeat on the static type system but it makes sense. A DisplayRectangle created from a ScrollRectangle is simply one that has an identity CSSTransform. It significantly simplifies intermediate calculations and avoids redundant copies.

The implementation of the system relies on template metaprogramming. It is one of the few cases where I find a good application for it’s somewhat esoteric constructs.
Coordinate systems are defined as types:

namespace CoordComponents
{
     struct Layout{};
     struct Scroll{};
// ..
}

Typed Rectangle is a thin wrapper around our generic Rectangle class.

template<typename… Components>
class Rectangle
{
// …
};

Commonly used types are defined as:

using LayoutRectangle = Rectangle<CoordComponents::Layout>;
using ScrolledRectangle = Rectangle<CoordComponents::Layout, CoordComponents::Scroll>;

The same principle is applied to the Matrix class which has a list of transformation components as its type signature.
The gist of the method validation and type synthesis can be seen in the Unite2D and Transform methods.

template<typename… RhsComponents>
void Unite2D(const Rectangle<RhsComponents…>& other, bool allowEmpty = false)
{
    static_assert(sizeof…(Components) >= sizeof…(RhsComponents), “Cannot assign to type with less components than operand!”);
    static_assert(meta_contains_types<meta_packer<Components…>, meta_packer<RhsComponents…>>::value, “Operand has components that are not part  of this object!”);
    m_Value.Unite2D(other.m_Value, allowEmpty);
}

Unite2D can take as parameter any other type of Rectangle but we validate:

  1. That the components of the this rectangle are less or equal those of the parameter. This avoids applying “broader” transforms to narrower ones, like uniting a ScrolledRectangle on a LayoutRectangle. They belong to different coordinate systems, so the operation is invalid.
  2. The second check handles situations where the two Rectangles might have a different set of components. Uniting a Rectangle<Layout, Scroll> with a Rectangle<Layout, Transform> is invalid.

The static_assert use some meta functions that implement the actual type checking.

The Transform method show type synthesis:

template<typename… MatrixComponents>
typename meta_unite_params<Rectangle<Components…>, Matrix<MatrixComponents…>>::type
Transform(const Matrix<MatrixComponents…>& mat) const
{
    typename meta_unite_params<Rectangle<Components…>, Matrix<MatrixComponents…>>::type result;
    result.m_Value = m_Value.Transform(mat.Unwrap());
    return result;
}

The declaration is definitely a mouthful, but basically says: “make a Rectangle, whose components are the union of the current one and the ones of the Matrix”. For instance:

Rectangle lr{…};
Matrix<Scroll, Transform> mst {…};
auto result = lr.Transform(mst); /*result is Rectangle<Layout, Scroll, Transform>*/

In this case “result” will have a type Rectangle<Layout, Scroll, Transform>, that is a Rectangle that is a Layout box with scrolling and some CSS transformation.

Results

The system is relatively new in our code, so the long time impact still has to be measured. I find however that local operations and data members are now much clearer in their intent and the amount of confusion has definitely decreased. I was concerned about compilation times but found no significant slowdown since the introduction of the system.

While the implementation is somewhat complex due to the meta programming stuff, the benefits outweigh it and the usage itself is straightforward.

String interning

When we started to design the architecture of Hummingbird, we knew that an HTML rendering engine has to rely heavily on strings due to the nature of CSS style solving. CSS element ids and classes are defined by the standard as strings and the cascade resolution relies on string comparisons to work. All elements that have a class “my_class” will match a selector for said class.

String operations in C++ are notoriously slow. When solving the styles of a large page you might have to do hundreds or even thousands of string comparisons if you use a vanilla string implementation. The string comparisons are slow as most string implementations allocate their memory on the heap and contain an internal pointer to the actual “char” array and the memory has to be actually compared over the whole string. Cache misses will kill the performance.

In Hummingbird we can’t afford such a naive implementation so we looked for alternatives for how to represent the “id” and “classes” of HTML elements. The most important thing is to have quick comparisons that are vital to the style solving process.

In a game engine where all the content is well known when the game is being built, many developers rely on doing “perfect hashes” of their strings and effectively substituting them everywhere with ids. This is a great solution when you know that the whole pool of strings you’ll encounter is fixed.Unfortunately, such a solution cannot work in Hummingbird where the user can set any id and class in her styles and even generate new ones through JavaScript. Everything has to be dynamic.

We went for a well-known solution called “string interning”and in essence represents a runtime perfect hash.

In our implementation:

  • All strings of a View (basically an HTML page + scripts) are kept in contiguous memory regions.
  • The addresses of all currently interned strings are stored in a hash set.
  • When we encounter a string to be interned, we check the hash set. If the string is already in some buffer – we return the address to it, otherwise we copy the characters in a free region of our buffers, put the address in the set and return the newly created interned string.

The InternedString object that we use is simple, it contains just a const char* pointer. We guarantee that interned strings won’t move in memory.

This representation is very efficient

  • Each interned string consumes memory just once, no matter how many copies are there
  • Two interned strings are the same IFF they have the same address. This makes comparisons super-fast as we only compare pointers!
  • We still have easy access to the contents of the string via the const char*

This representation gives a great performance boost. Our implementation does not try to reclaim eventually unused strings. It is extremely unlikely in our usage scenarios to have interned strings take too much memory.

Hummingbird is a heavily parallel library, so trying to intern a string from multiple threads is a very real scenario. At the moment the “interning context” is locked. Each View has it’s own “interning context. So instead of having a system-wide interning context that risks taking too much memory and becoming a locking bottleneck, we have View-wide ones that reduce contention and practically eliminate the risk of interned strings talking too much memory even during long sessions.

The results are great with ~30% improved style solving performance compared to a version where simple strings are used.