Compile-time check on task parameters (part 2)

In my previous post in the series I wrote how in our HTML rendering engine we try to avoid accidental errors when passing parameters to tasks. A task is a unit of work that might happen on a different thread. Specifically we want to avoid passing objects of types that are not designed to be kept in a task. The previous post highlights the rationale behind the design of the system, I’ll only discuss the implementation here.

The syntax for launching the task is the following:

The syntax is very close to a normal C++ lambda, slightly modified to perform the parameter validation, which is hidden behind the TASK_PARAMS macro. At compile-time it’ll walk all the variables passed to it and make sure that their types are explicitly allowed to be passed to a task. All other types will produce a meaningful compilation error.

The macro is expanded to a ValidateTaskParameters function that looks like this:

The function inspects all parameters through variadic template expansion and performs a series of compile-time checks of their types. The meta-programming templates are somewhat annoying, but worth the complexity in this case.

The first condition for instance says “if the type is a pointer and it wasn’t explicitly allowed to be passed as pointer – generate error”. We usually don’t allow naked pointers to be passed around in tasks, but if the developer knows what she is doing, she can force-allow it.

Marking types is done with template tagging hidden behind a macro for convenience.

Additional macros are available for marking other ways to share an object: by shared pointer, by weak pointer etc. There is also tag & macro that forbid passing a type to a task altogether.

To recap, our system forces developers to think carefully what types can be passed around in tasks and reduces the chances of accidental errors. The implementation warns at the earliest possible time – during compilation and has no run-time cost and a modest compilation-time cost.

Suggestions and comments are appreciated, please share if you have tackled a similar problem.

Web Tools for Application UI

Yesterday at GDC 17, Andreas Fredriksson from Insomniac gave a fascinating talk about their experience using web tools (HTML, JS) to create AAA game editing tools. You can see the slides here: https://deplinenoise.wordpress.com/2017/03/03/slides-insomniacs-web-tools-postmortem/.

I believe that using web tools for application UI development is a good solution. At Coherent Labs, along with our main middlewares, we develop the Coherent Editor, which is a game UI development tool written with web tech, like Insomniac’s.

I want to give a sum-up of our experience developing the tool and how it compares to Insomniacs’. The Editor is still in heavy development and we are happy how it’s shaping up and the development process we built around it.

The Coherent Editor in version 1.5.3.4 looks like this:

coherent_editor

The final output is a game UI in standard HTML/CSS that can be consumed in Coherent GT or Hummingbird. This particular sample will look like this in the game:

data_binding

The Coherent Editor is currently ~100K lines of JavaScript, it’s smaller than Insomniacs’ tools, but still a very large application by standard-JS-app sizes.

JavaScript gives us the same benefits that Andreas has highlighted + some more:

  • Super-quick iteration
  • We can leverage JS libraries
  • Exercises our own runtime – Coherent GT
  • Can output optimal HTML/CSS for our runtime
  • Directly interfaces with our products
  • Can have game-specific features

Finding great JavaScript developers seems easier than C++ devs. We leverage all C++ devs only for the development our runtimes.

Andreas highlights many issues they encountered while using the web stack in their development process, I’ll sum-up how we solve them in our team:

  • Don’t use Chrome/CEF/Firefox/Edge etc.

The Coherent Editor runs on-top of Coherent GT. Here we have the advantage that we create our own technology. 5 years ago when we started Coherent Labs, we based our first generation product on chromium (the project Chrome builds upon). We quickly realised this is not so good. Chrome is an amazing beast but does too much things, changes too often and breaks things all the time.

With our own technology we keep features stable and concentrate on what makes sense for games. Generally GT is an order of magnitude faster rendering-wise compared to Chrome. Chrome has a super complicated out-of-process architecture that is useless for games and hogs memory like no other.

Andreas notes that in the end they had to freeze the Chrome version they used and that a stable custom runtime would have been better. When we started the Editor there were debates if it should be based on Chrome/Firefox, but fortunately we didn’t go that road.

  • Extend communication JS <-> C++

We have a very easy layer of data binding in GT that allows communication and data-exchange between JavaScript and C++. All the OS-specific things is directly implemented in the Editor. An API is exported from C++ to JS and it can do everything it needs – loading/saving files, launching external tools, importing from Photoshop/Illustrator and so on. This removes a huge amount of complexity, when you need something written in native, you simply go, write it and use it from JS.

Performance-heavy tasks can also go to other threads in C++, we are not limited to the threads JS allows.

  • Test all the things

We also quickly realized that a lot of JS code becomes quickly very brittle. Fortunately JS is also easy to test. We added Selenium support to Coherent GT and now the Editor team writes test for all their features, which helped a lot for the stability and confidence in the tool.

  • Use declarative data binding and components

The best code is the one that does not exist. GT has a declarative data-binding layer that allows UI developers to NOT write any JS code and attach fields, properties etc. directly to the C++ data model. It can also instantiate UI elements (components) in a “for”-cycle way.

Think an “Open file” dialog, where the “model” is the list of files read from C++ and the dialog is directly populated with the “File” components in the UI. The “File” component is a small piece of HTML with and icon, text, properties etc. This approach saves a ton of time and reduces the risk of errors.

  • Typescript

The Coherent Editor is written in Typescript, which is also what Insomniac ended up doing. Specialized scripts run autonomous and recompile and redeploy the changed JS. It happens pretty quickly at least in our application and doesn’t seem to impact iteration time significantly. I guess it depends on the overall project structure though.

  • GC configuration

We have configs on the JS GC that make it run less often during heavy tasks and allows it to do it’s work when the Editor is relatively idle. It increases the peak memory usage, but eliminates hiccups.

Conclusions

Using web-only tech with a standard browser is very, very tough to get right for a large Application UI shell. What saved us is that we have our own technology that solves the downsides of a “pure” browser-based solution.

  • Performance was not a problem – GT is a game UI runtime designed for consoles so on development machines, the Coherent Editor UI flies.
  • Data-binding really makes a difference. We designed it to make it easy for games to communicate with the UI and in essence NOT write any JavaScript. Separating the UI development from the “backend” speeds up development x10.
  • If something doesn’t work as we like – we can change it. Although Chrome is OSS, making changes there is incredibly time-consuming and risky (they break it the day after). Believe me.. I’ve gone that path.

We still have a lot of work to do on the Coherent Editor. There are operations that can put it to it’s knees (try loading 4000 images and go grab a coffee) but they are mostly related to handling specific cases than to the overall architecture. We haven’t yet achieved the complexity of Insomniacs’ tools and I hope it’ll hold when we do.

There’s a lot of merit in using HTML/JS/CSS for large applications. I know of other companies that have used web tech for their tools, if you have a story to share, please do.

Compile-time check on task parameters (part 1)

The Hummingbird HTML rendering engine’s architecture is based on tasks (also known as jobs). Units of work are dispatched to a task scheduler who runs them on different threads to maximize the resource usage of the hardware.

Multi-threaded programming however is notoriously difficult because of the risk of race conditions. An awesome overview of how to utilize the hardware is given in the following blog series.

Hummingbird’s task system is very versatile. It allows developers to schedule tasks whose body is a C++ lambda and put them in different queues for processing. We decided from the inception of the system to try maximize the freedom of the developer when creating and using tasks. This freedom however can be dangerous when the lifetime and eventual data races between objects have to be taken into account.

Tasks conceptually (although the same in code) can be divided in two groups:

  • “Simple” data-oriented tasks that execute some data transformation on an independent work group and have no side effects.
  • High-level tasks that might arise from the interaction of multiple systems and objects with possibly complex lifetimes.

I definitely try to have as much work possible in the first type of tasks. A good example of such a task is the layout of a group of elements. The input data are POD structs with the styles needed for the element layout and the output are the positions of each element on the page.

High level tasks require an interaction with the lifetime of objects, possibly some of them are reference counted and usually can be accessed only in certain threads.

A particularly nasty error that arises from using reference counted objects is having them be destroyed in unexpected moments or threads.

In Hummingbird we try to encourage “simple” tasks and to make higher level ones tough to accidentally misuse. We introduced a compile-time checking system for task parameters. Classes can be marked to allow/disallow using their instances as task parameters. There are four ways for an object to be passed as a parameter in a task:

  • Passing object by value. This is always OK in our system. The object passed will be a copy private to the task so it shouldn’t involve changing global or shared state. The object can still contain pointers as members or modify global state in a method but this is better caught in design or code reviews and has never caused errors in our code so far.
  • Passing object by pointer. This is generally NOT OK. The system will not allow by default passing pointers unless they are explicitly marked by the programmer. Passing naked pointers to tasks is the source of most errors as the lifetime of said object is probably outside the task and there is a chance that object will be accessed concurrently. There is also the issue with the lifetime of the object, which is not well defined.
  • Passing by shared pointer. DOM objects often have shared ownership due to the design of the standard DOM and the interactions with JavaScript. To pass a shared pointer to a certain type, the developer has to explicitly mark it as OK.
  • Passing weak pointers. In the beginning we implicitly allowed this usage but recently made them require explicit marking as well.

Explicitly marking which classes are OK to be passed between tasks has several benefits:

  • Forces the programmer to think twice about the lifetime and concurrent use of data.
  • Helps in code reviews by signaling to reviewers that they should carefully inspect the usage of certain objects.
  • Implies a contract for how the data will be used and self-documents the code.
  • Avoids inadvertently passing wrong objects to tasks, which can happen due to the easy syntax provided by lambdas.

We have also added a mechanism to disallow passing a class to tasks altogether, even by value.

The implementation of the system is based on templates and some macros to make the syntax easy.

Creating a task is done in the following way:

The key here is the TASK_PARAMS, which validates each parameter. In the next blog post I’ll go into details on how the task validation mechanism is implemented.

Static variables in functions bite

The other day I was investigating the performance profile of a stress test in our Coherent Labs Renoir graphics library. Renoir collects high-level rendering commands like “DrawText”, “DrawPath” etc, and transforms them in low-level API commands for the GPU.

What caught my eyes was that the function BlendingMode2State was taking ~ 1.5% of the time. The function is called for each drawing command (hundreds of times per-frame), but still this looked disproportional. The function is declared with the following signature:

inline BlendingState BlendingMode2State(BlendModes mode)

BlendModes is an enum that contains pre-defined blending modes and BlendingState is a simple structure that contains the required GPU operations to implement that blend mode (SrcBlend, DestBlend etc. for more information on graphics alpha blending, take a look here)

Looking at the implementation made things clear:

The implementation immediately rang a bell. The static variable holding the mapping of the blend modes is declared at function level. According to the lifetime rules of C++, it’ll be initialized the first time the function is called. This bring the side effect the in each cal there is branch that checks if the variable has been initialized. Not good.

Checking the disassembly revealed another inefficiency. The initialization assembly was 2 screens long at the beginning of the function – there are a lot of BlendModes. The assembly did a conditional jump if the variable is initialized and skipped hundreds of bytes worth of initialization code to go to the gist of the function, which is just returning the correct entry in the array.

The third problem was that the initialization code hindered the inlining of the function. It was so large that it made sense that the compiler ignored the “inline” request.

To recap, there were actually 3 linked issues with the function:

  • static variable at local scope requires a branch on each call
  • massive initialization code causes potential instruction cache trashing
  • inlining is impossible due to the initialization code withing the function

The fix was trivial:

I moved the variable outside the function. This allowed for proper function inlining and gave an overall 1% performance increase of the library in my test. Not bad for 10 minutes work.

In your work avoid static variables at function level – they are bad practice anyway, mostly used for singleton objects whose lifetime has to be lazy. If you have such an object that is accessed very often, you might have a similar performance hit.

W3C WebVR workshop 2016

On October 19 and 20 I had the amazing opportunity to take part in the W3C WebVR Workshop in San Jose. The event affirms VR as a major direction for the future of the web.

At Coherent Labs, we create HTML renderers so it was great to meet so many other browser developers that work on the same problems we do and share a similar mindset.

The first day, the event was scheduled to start and 8.30. I and my colleague George are in San Francisco, so this meant an early wake-up at 5 o’clock. After a succession of trolley bus, CalTrain and Uber we arrived at the Samsung offices in San Jose. The content more that paid off the early wake and travel.

I take the opportunity to thank Samsung for hosting the event and organizing it flawlessly with the W3C folks.

The event was packed with browser and VR developers including people from Mozilla, the Chrome team, Oculus, Samsung, Valve, the Edge team and many others. Everybody was really open to share ideas. Browser developer shared current state of VR support in their products along with their short-term release plans.

The workshop was divided in 2 days with the highlights being a starting and ending keynote, many “lightning” (5 minutes) talks and “breakout sessions” the second day. In breakout sessions different groups discussed ideas in many areas of VR integration with the web.

Currently most VR-related work in browsers is in the WebVR standard. This is an extension over WebGL allowing to interface with HMDs and render to them via WebGL. This approach opens a many opportunities to content creation as developers don’t need to learn using a game engine. They can use their familiar JavaScript and web development skills. The web also simplifies content distribution – users just have to navigate on a web page and be immersed.

Of the lightning talks I was most interested in the one by Josh Carpenter from the Chrome team and the ones from Justin Rogers from Oculus. Josh shared his vision of the future of the web in VR, while Justin accented more on technical aspects and performance – many of the problems are similar to the ones we tackle in Coherent Labs’ products.

In the “breakout” sessions, I attended a very interesting one initiated by Tony Parisi (who is now Head of AR/VR at Unity) on an eventual future declarative 3D standard for the web. The idea was met with a lot of enthusiasm in the meeting, although people had different reasons they believed it was important. Half the people attending believe that it’ll ease authoring, a smaller percentage stressed the importance of homogeneity between platforms, while ~15% believe performance is the most important reason. Coming from game development I also voted for the performance reason.

The workshop ended with a recap and the commitment to reconvene soon (at least by W3C metrics ~ 1 year). All the materials from the meeting are public and available here.

The discussions identified many areas in which to work and there are still many open questions. WebVR requires writing a complete new website. While this might be feasible for new sites, it doesn’t solve the need to meaningfully display the billions of pages that are currently on the web. The DOM is still completely separated from the VR world so all the usual layout, styling etc. are still off-limits in VR.

Performance is an open problem, especially on mobile, where battery drain and overheating can make the experience particularly unpleasant for the user. The current browsers do a lot of things besides just rendering the scene and that can introduce stuttering and break immersion.

Last but not least, there is no solution for AR. Bringing all the AR-specifics – spatial interactions, real world elements etc. will be a real challenge in the following years.

The discussions opened some interesting questions for me personally as well. Our Hummingbird HTML renderer is super fast on mobile, equipping it with a WebVR renderer is a way forward that it will bring amazing benefits to VR users who will be able to stay immersed for longer and with better visuals.

Hummingbird scales great on modern mobile platforms, so overheating and stuttering will not happen.

Overall the W3C workshop was an eye-opening experience for me. The lion’s share of the work however is still ahead of us as a community. I’m very happy to be part of it!

Type annotation in C++

In systems like game engines and our HTML renderer Hummingbird, developers have to work with objects transformed in different coordinate systems. Using one generic type can lead to confusion on what object is required in a particular situation. Errors are often subtle and hard to track. I tried to mitigate this by using stringent static typing in our software. New types are created by annotating them with metadata.

In the HTML visual model, almost everything is a box (a rectangle), and it’s size and position is calculated by the layout engine. The boxes however have to go through additional transformations until they reach their final positions on-screen – they can be scrolled, 2D/3D transformed by CSS properties etc. This is similar to a game engine where objects also undergo different transformations between coordinate systems – model, view, world, shadow and so on. Some calculations in the code require boxes in the Layout coordinate system, others need them scrolled, others need them transformed.

We used a Rectangle C++ struct everywhere in Hummingbird to represent a Box. It became a common error to pass the wrong Rectangle to an operation. For instance a certain calculation needs a Layout box, but the programmer passes by error a scrolled one. A more explicit system was required.
If we look at a typical C++ function signature in the style:

bool HitTest(float2 coords, const Rectangle& box);

It is unclear what the exact type of the box has to be – in our software the programmer could pass a scrolled box, while a layout one is required. Testing this is also tricky, because if the box has no scroll (it is 0,0) then it’ll work most of the time but break upon scrolling.

The issue was solved by “documenting” the requirement:

// Requires a layout box
bool HitTest(float2 coords, const Rectangle& layoutBox);

Unfortunately this is still error-prone as relies only on the programmer and code reviewer attention.

It’s better to have the C++ static type system help in the situation:

bool HitTest(float2 coords, const LayoutRectangle& box);

Now this is much more clear and will avoid accidental errors.

The idea can be extended to cover also the transformations themselves that lead to a certain box. The transformation (a matrix) is also strongly typed and encodes compile-time the coordinate system it work in. Instead of having a generic Matrix class we have a ScrollMatrix, CSSTransformMatrix etc as types. The product of a box transformation is defined by the input box and the matrix applied.

LayoutRectangle lb = LayoutElement(element);
ScrollMatrix scroll = GetScroll(element);
ScrollRectangle sr = lb.Transform(scroll);

The system will automatically generate the correct type from the input parameter and the typed transform.

Implementation

The system has the following requirements:

  • No runtime overhead
  • Types are defined by the transforms they encode
    • DisplayRectangle is Rectangle with {Layout, Scroll and CSSTransform}
    • ScrollRectangle is Rectangle with {Layout and Scroll}
  • Maximum static checking on types and improve code readability
  • Typed transforms must generate correct new types
  • Types that are a subset of another are allowed to be assigned on them. For instance:
    • DisplayRectangle ds = ScrollRectangle sr(..);

The final requirement may seem like a defeat on the static type system but it makes sense. A DisplayRectangle created from a ScrollRectangle is simply one that has an identity CSSTransform. It significantly simplifies intermediate calculations and avoids redundant copies.

The implementation of the system relies on template metaprogramming. It is one of the few cases where I find a good application for it’s somewhat esoteric constructs.
Coordinate systems are defined as types:

namespace CoordComponents
{
     struct Layout{};
     struct Scroll{};
// ..
}

Typed Rectangle is a thin wrapper around our generic Rectangle class.

template<typename… Components>
class Rectangle
{
// …
};

Commonly used types are defined as:

using LayoutRectangle = Rectangle<CoordComponents::Layout>;
using ScrolledRectangle = Rectangle<CoordComponents::Layout, CoordComponents::Scroll>;

The same principle is applied to the Matrix class which has a list of transformation components as its type signature.
The gist of the method validation and type synthesis can be seen in the Unite2D and Transform methods.

template<typename… RhsComponents>
void Unite2D(const Rectangle<RhsComponents…>& other, bool allowEmpty = false)
{
    static_assert(sizeof…(Components) >= sizeof…(RhsComponents), “Cannot assign to type with less components than operand!”);
    static_assert(meta_contains_types<meta_packer<Components…>, meta_packer<RhsComponents…>>::value, “Operand has components that are not part  of this object!”);
    m_Value.Unite2D(other.m_Value, allowEmpty);
}

Unite2D can take as parameter any other type of Rectangle but we validate:

  1. That the components of the this rectangle are less or equal those of the parameter. This avoids applying “broader” transforms to narrower ones, like uniting a ScrolledRectangle on a LayoutRectangle. They belong to different coordinate systems, so the operation is invalid.
  2. The second check handles situations where the two Rectangles might have a different set of components. Uniting a Rectangle<Layout, Scroll> with a Rectangle<Layout, Transform> is invalid.

The static_assert use some meta functions that implement the actual type checking.

The Transform method show type synthesis:

template<typename… MatrixComponents>
typename meta_unite_params<Rectangle<Components…>, Matrix<MatrixComponents…>>::type
Transform(const Matrix<MatrixComponents…>& mat) const
{
    typename meta_unite_params<Rectangle<Components…>, Matrix<MatrixComponents…>>::type result;
    result.m_Value = m_Value.Transform(mat.Unwrap());
    return result;
}

The declaration is definitely a mouthful, but basically says: “make a Rectangle, whose components are the union of the current one and the ones of the Matrix”. For instance:

Rectangle lr{…};
Matrix<Scroll, Transform> mst {…};
auto result = lr.Transform(mst); /*result is Rectangle<Layout, Scroll, Transform>*/

In this case “result” will have a type Rectangle<Layout, Scroll, Transform>, that is a Rectangle that is a Layout box with scrolling and some CSS transformation.

Results

The system is relatively new in our code, so the long time impact still has to be measured. I find however that local operations and data members are now much clearer in their intent and the amount of confusion has definitely decreased. I was concerned about compilation times but found no significant slowdown since the introduction of the system.

While the implementation is somewhat complex due to the meta programming stuff, the benefits outweigh it and the usage itself is straightforward.

String interning

When we started to design the architecture of Hummingbird, we knew that an HTML rendering engine has to rely heavily on strings due to the nature of CSS style solving. CSS element ids and classes are defined by the standard as strings and the cascade resolution relies on string comparisons to work. All elements that have a class “my_class” will match a selector for said class.

String operations in C++ are notoriously slow. When solving the styles of a large page you might have to do hundreds or even thousands of string comparisons if you use a vanilla string implementation. The string comparisons are slow as most string implementations allocate their memory on the heap and contain an internal pointer to the actual “char” array and the memory has to be actually compared over the whole string. Cache misses will kill the performance.

In Hummingbird we can’t afford such a naive implementation so we looked for alternatives for how to represent the “id” and “classes” of HTML elements. The most important thing is to have quick comparisons that are vital to the style solving process.

In a game engine where all the content is well known when the game is being built, many developers rely on doing “perfect hashes” of their strings and effectively substituting them everywhere with ids. This is a great solution when you know that the whole pool of strings you’ll encounter is fixed.Unfortunately, such a solution cannot work in Hummingbird where the user can set any id and class in her styles and even generate new ones through JavaScript. Everything has to be dynamic.

We went for a well-known solution called “string interning”and in essence represents a runtime perfect hash.

In our implementation:

  • All strings of a View (basically an HTML page + scripts) are kept in contiguous memory regions.
  • The addresses of all currently interned strings are stored in a hash set.
  • When we encounter a string to be interned, we check the hash set. If the string is already in some buffer – we return the address to it, otherwise we copy the characters in a free region of our buffers, put the address in the set and return the newly created interned string.

The InternedString object that we use is simple, it contains just a const char* pointer. We guarantee that interned strings won’t move in memory.

This representation is very efficient

  • Each interned string consumes memory just once, no matter how many copies are there
  • Two interned strings are the same IFF they have the same address. This makes comparisons super-fast as we only compare pointers!
  • We still have easy access to the contents of the string via the const char*

This representation gives a great performance boost. Our implementation does not try to reclaim eventually unused strings. It is extremely unlikely in our usage scenarios to have interned strings take too much memory.

Hummingbird is a heavily parallel library, so trying to intern a string from multiple threads is a very real scenario. At the moment the “interning context” is locked. Each View has it’s own “interning context. So instead of having a system-wide interning context that risks taking too much memory and becoming a locking bottleneck, we have View-wide ones that reduce contention and practically eliminate the risk of interned strings talking too much memory even during long sessions.

The results are great with ~30% improved style solving performance compared to a version where simple strings are used.