Bringing fast HTML5 on embedded devices

Graphical user interfaces are now all around us – no longer only on our PCs, phones and tablets. GUIs are also in our cars, home appliances and even elevators. As an engineer I was curious how those are developed for embedded systems. I’d like to share what I found. Unsurprisingly, there are two schools of thought – make it with native/hardcoded C/C++ code (often generated from visual tools) or use data-driven, markup-based language, combined with scripting HTML5 & JavaScript. In that regard the embedded GUI eco-system is similar to PC and mobile. Both approaches have upsides and downsides.

The most common opinion is that C and C++ based solutions are faster, while HTML5-based ones are too slow except for the most beefy devices, which relegates them to very expensive cars or machinery. One of the popular C++ camp proponents, Qt even published a whitepaper which compares the two approaches and concludes with the performance superiority of the native solution. As a C++ developer myself (doing a lot of low-level optimization work) I also often tend to share this limited mindset. However this is dogma – what is the reality of the trade-offs of both approaches?

tesla

The touchscreen in a Tesla automobile. Image courtesy of Tesla

There is a very big advantage to create UI with standard HTML5 & JavaScript. The immense industry know-how in the technology, hundreds of production-grade libraries and tools, and most importantly, enabling front-end developers to work on embedded UIs. The end results of such democratization are better interfaces and happier customers.

With non-native approach the problem to overcome is performance. But I believe it is is not the choice of technology per se, but the specific implementation. Most HTML5-based solutions use Chromium, which simply is not made for embedded architectures. The reluctance of the developers to put millions of lines of code on such limited architecture and use a software solution that is famous to eat gigabytes of RAM like candy, is more than reasonable. Such concern is pretty much the same with what we had in video games when we started Coherent Labs in 2012. Developers wanted and saw the advantages of HTML5 as UI technology, but there was no solution that could cope with the performance requirements of games. We solved these problems and built Coherent GT and Hummingbird, currently powering hundreds of games.

We can make a significant parallel between the requirements of video games and embedded systems:

  • Need of efficient rendering – in games there is very little time to finish the work before the frame has to be displayed to maintain 30 or 60 frames per second. In embedded you have low-powered systems (read slow) that still have to achieve similar FPS.
  • Low memory footprint – in games developers want to use the most of the memory to textures, sounds, 3D models, and so on to create the most appealing gameplay and visuals. So, the user interface  should take very little memory. In the case of embedded devices there is usually little amount of RAM.
  • To create a good user experience interfaces of both games and embedded devices need animations, audio and video. The bar for UI complexity and the user expectations for visuals are already very high.

In theory, Hummingbird on embedded will give us the best of both worlds – an awesome workflow and the required performance.

bmw_x5_rear_seta_ent.jpg

Entertainment system would be a great fit for dynamic HTML5 technology. Image courtesy of BMW

I intend to test Hummingbird on some embedded devices and measure how it performs. The first step will be to port it to an embedded OS – I’ll probably start with Linux, ARM and Raspberry Pi, easily accessible platform with good tools for a proof of concept.

If you are an embedded UI developer – what are your biggest issues and what platforms do you target? If you’re interested to help me with the experiment, don’t hesitate to write a comment or reach me on Twitter @stoyannk!

Advertisements

Small vector optimization

One of the key performance optimization techniques involves reducing the number of dynamic memory allocation that a program does. The reasons are:

  • Generic memory allocations are relatively slow
  • Heap-allocated objects reduce the effectiveness of CPU caches (misses are more probable)

There are numerous techniques to alleviate allocations, but not allocating memory from the heap in the first place would be ideal. This is not possible in most cases when the data we’ll manipulate is truly dynamic, but we can identify common cases where 99% of the time an allocation can be avoided – the size is often the same.

C++ strings are a prime example of objects that would allocate too often. A naive implementation would always have the data of the string on the heap, which will require always at least 1 allocation and possibly cache misses when code accesses that content. Developers noticed that and now effectively all std::string implementations implement the so called “small string optimization”. Each string object contains a small inline buffer that will hold the data, if it fits. It’ll allocate only if the data can’t fit that buffer. As most strings in practice are small – they’ll save an allocation and improve cache locality very often.

Small Vectors

The same technique can be implemented for the more generic vector type. It’s also widely implemented in many libraries. A very inspirational lecture on the successes of using “small vector optimization” is given by Chandler Carruth in his talk “High Performance Code 201: Hybrid Data Structures” where he discusses its application in LLVM.

Instead of always allocating the buffer for the objects in the vector on the heap – we have a small inline piece of memory that can hold N objects. The N constant depends on the usage and should be carefully selected depending on the context to trade between allocations and the potential memory waste if we have less than N objects.

I decided to apply the optimization in certain areas of our Hummingbird software and measure the gains. In our “standard” library I implemented a std::vector substitute creatively named “dyn_array” from dynamic array. It has the same public interface as std::vector, so they are easily interchangeable.

It’s declared like this:

template<typename T, typename Allocator, size_t InlineCapacity = 0>
class dyn_array

The interesting part is the InlineCapacity param, which defines how many objects will be inline in the structure itself. When InlineCapacity is 0, dyn_array works like a standard vector – everything is allocated through the Allocator – usually from some heap.

To save memory within the dyn_array object, the inline buffer is used also as the pointer to the heap memory when necessary. The declaration is this beauty:

using DataStorage = char [Max_eval<sizeof(T) * InlineCapacity, sizeof(T*)>::value];
alignas(Max_eval<alignof(T), alignof(T*)>::value) DataStorage m_DataStorage = {0};

Edit: @lectem suggested that the same effect is achievable with std::aligned_storage

Max_eval is a meta-function, which returns the maximum of two compile-time values. The DataStorage should be big enough to hold at least a pointer, while the field itself has to be aligned properly for the inline objects.

I use the current capacity and the InlineCapacity constants to decide how to interpret the m_DataStorage. If the CurrentCapacity <= InlineCapacity, then the memory is “inline” and it’s interpreted as an array, otherwise the bytes are interpreted as a T*.

When looking at the potential usages in our code, the first target I had in mind were the CSS declarations (Hummingbird is an HTML rendering engine – like a web browser) for animations. Users can have as many animation properties within a declaration as they want – so the values have to be dynamic, but they very rarely exceed 1-4. Some declarations are also enums, which we limit to 1 byte, this means that on 32-bit targets we can squeeze 4 values within a pointer and on 64-bit 8 – more than enough in 90+% of the cases.

I wrote a small template meta-function that simplifies setting the inline size when you want it to fit exactly in one pointer. When the objects you are going to have in the dyn_array are smaller than a pointer, they are essentially free memory-wise.

The meta-function looks like this:

template<typename T>
struct FitInPointer
{
   static const auto value = sizeof(T*) / sizeof(T);
};

and is used like this:

dyn_array<Value, CSSAllocator, csl::FitInPointer<Value>::value> Values;

So far I’ve applied the dyn_array in all animation CSS properties and the result have been more than encouraging – reduction of dynamic allocation by 20% in tests and a page load-time reduction of 10%.

I’m now looking at other places where the optimization will have good effects.

Future direction

The next step will be to implement a debugging mechanism that detects vectors that are often re-allocated in test scenarios. They will be analyzed for opportunities of inline sizes. The LLVM folks implemented this to great effect. It’ll be a piece of cake for us, as adding the required debug code in dyn_array is far easier than modifying a STL vectors implementation.

Rendering HTML at 1000 FPS – Part 2

Squeezing the GPU

The post is also available in the official LensVR blog.

This is the second blog post of the sequence in which I talk about the LensVR rendering engine.

In the first post, I discussed the high level architecture of the LensVR/Hummmingbird rendering. In this post I will get into the specifics of our implementation – how we use the GPU for all drawing. I will also share data on a performance comparison I did with Chrome and Servo Nightly.

Rendering to the GPU

After we have our list of high-level rendering commands, we have to execute them on the GPU. Renoir transforms these commands to low-level graphics API calls (like OpenGL, DirectX etc.). For instance a “FillRectangle” high level command will become a series of setting vertex/index buffers, shaders and draw calls.

Renoir “sees” all commands that will happen in the frame and can do high-level decisions to optimize for three important constraints:

  • Minimize GPU memory usage
  • Minimize synchronization with the GPU
  • Minimize rendering API calls

When drawing a web page, certain effects require the use of intermediate render targets. Renoir will group as much as possible of those effects in the same target to minimize the memory used and reduce changing render targets for the GPU, which is fairly slow operation. It’ll also aggressively cache textures and targets and try to re-use them to avoid continually creating/destroying resources, which is quite slow.

The rendering API commands are immediately sent to the GPU on the rendering thread, there is no “intermediate” commands list as opposed to  Chrome, where a dedicated GPU process is in charge of interacting with the GPU. The “path-to-pixel” is significantly shorter in LensVR compared to all other web browsers, with a lot less abstraction layers in-between, which is one of the keys to the speed it gets.

Rendering in cohtml.png

GPU execution

The GPU works as a consumer of commands and memory buffers generated by the CPU, it can complete its work several frames after that work has been submitted. So what happens when the CPU tries to modify some memory (say a vertex or index buffer) that the GPU hasn’t processed yet?

Graphics drivers keep tracks of these situations, called “hazards” and either stall the CPU until the resource has been consumed or do a process called “renaming” – basically cloning under the hood the resource and letting the CPU modify the fresh copy. Most of the time the driver will do renaming, but if excessive memory is used, it can also stall.

Both possibilities are not great. Resource renaming increases the CPU time required to complete the API calls because of the bookkeeping involved, while a stall will almost certainly introduce serious jank in the page. Newer graphics APIs such as Metal, Vulkan and DirectX 12 let the developer keep track and avoid hazards. Renoir was likewise designed to manually track the usage of it’s resource to prevent renaming and stalls. Thus, it fits perfectly the architecture of the new modern APIs. Renoir has native rendering backend API implementations for all major graphics APIs and uses the best one for the platform it is running on. For instance on Windows it directly uses DirectX 11. In comparison, Chrome has to go through an additional abstraction library called ANGLE, which generates DirectX API calls from OpenGL ones – Chrome (which uses Skia) only understands OpenGL at the time of this post.

Command Batching

Renoir tries very hard to reduce the amount of rendering API calls. The process is called “batching” – combining multiple drawn elements in one draw call.

Most elements in Renoir can be drawn with one shader, which makes them much easier to batch together. A classic way of doing batching in games is combining opaque elements together and relying on the depth buffer to draw them correctly in the final image.

Unfortunately, this is much less effective in modern web pages. A lot of elements have transparency or blending effects and they need to be applied in the z-order of the page, otherwise the final picture will be wrong.

Renoir keeps track of how elements are positioned in the page and if they don’t intersect it  batches them together and in that case the z-order no longer breaks batching. The final result is a substantial reduction of draw calls. It also pre-compiles all the required shaders in the library, which significantly improves “first-use” performance. Other 2D libraries like Skia rely on run-time shader compilation which can be very slow (in the seconds on first time use) on mobile and introduce stalls.

Results & Comparisons

For a performance comparison I took  a page that is used as an example in the WebRender post. I did a small modification, substituting the gradient effect with an “opacity” effect, which is more common and is a good stress test for every web rendering library. I also changed the layout to flex-box, because it’s very common in modern web design. Here is how it looks:

Website gif

Link to page here.

All tests were performed on Windows 10 on a i7-6700HQ @ 2.6GHz, 16GB RAM, NVIDIA GeForce GTX 960M, and on 1080p. I measured only the rendering part in the browsers, using Chrome 61.0.3163.100 (stable) with GPU raster ON, Servo nightly from 14 Oct 2017, and LensVR alpha 0.6.

Chrome version 61.0.3163.100 results

The page definitely takes a toll on Chrome’s rendering, it struggles to maintain 60 FPS, but is significantly faster than the one in the video. The reasons are probably additional improvements in their code and the fact that the laptop I’m testing is significantly more powerful than the machine used in the original Mozilla video.

Let’s look at the performance chart:

Chrome_perf

I’m not sure why, but the rasterization always happens on one thread. Both raster and GPU tasks are quite heavy and a bottleneck in the page – they dominate the time needed to finish one frame.

On average for “painting” tasks I get ~5.3ms on the main thread with large spikes of 10+ms, ~20ms on raster tasks and ~20ms on the GPU process. Raster and GPU tasks seem to “flow” between frames and to dominate the frame-time.

Servo nightly (14 Oct 2017) results

Servo fares significantly better rendering-wise, unfortunately there are some visual artifacts. I think it’s Servo’s port for Windows, that is still a bit shaky.

You can notice that Servo doesn’t achieve 60 FPS as well, but that seems to be due to the flex-box layout, we ignore that and look only at the rendering however. The rendering part is measured as “Backend CPU time” by WebRender at ~6.36ms.

Servo GPU

LensVR alpha 0.6

LensVR Rendering

Here is one performance frame zoomed inside Chrome’s profiling UI which LensVR uses for it’s profiling as well.

The rendering-related tasks are the “Paint” one on-top, which interprets the Renoir commands, performs batching and executes the graphics API calls and the “RecordRendering” on the far right, which actually walks the DOM elements and generates Renior commands.

The sum of both on average is ~2.6ms.

Summary

The following graphic shows the “linearized” time for all rendering-related work in a browser. While parallelism will shorted time-to-frame, the overall linear time is a good indicator on battery life impact.

chart

Both WebRender and Renoir with their novel approaches to rendering have a clear advantage. LensVR is faster compared to WebRender, probably because of a better command generation and API interaction code. I plan to do a deeper analysis in a follow-up post.

 

Rendering HTML at 1000 FPS – Part 1

This was originally posted on the LensVR blog.

Part 2 is also available here.

Web page rendering is one of the most interesting and active development areas in computer graphics. There are multiple approaches with pros and cons. In the post I’ll go into details about how we do HTML rendering in Coherent Labs’ Hummingbird and LensVR browser and how it compares to Chrome and Mozilla’s WebRender.

I’ll split the post in two parts, this first one is dedicated to the high level architecture and how we decide what to render. The second part – “Squeezing the GPU” will be about how these decisions get implemented to use the GPU for all drawing and will give some performance results I measured.

The renderer described is still experimental for general web pages, but is very successfully deployed in game user interfaces across PC, mobile and consoles. The constraints of these platforms led us to somewhat different design decisions compared to what the folks at Chrome and Mozilla do. Now we are applying this approach to more web pages in LensVR and, feedback is most welcome.

Recently in an awesome post about Mozilla’s WebRender, Lin Clark explained not only how WebRender works, but also gives a great overview of the way rendering happens in most web browsers. I advise everybody who is interested in how browsers work to read her post.

To quickly recap I’ll concentrate on what we internally call rendering of the web page. After the Layout engine has positioned all DOM elements on the page and their styles have been calculated, we have to generate an image that the user will actually see.DOMandRendering.jpg

The rendering is implemented through the Renoir library. Renoir is a 2D rendering library that has all the features required to draw HTML5 content. It is conceptually similar in its role to Mozilla WebRender and Skia (used in Chrome and Firefox before Quantum).

When designing Renoir, performance was our primary goal and we built it around three major paradigms:

  • All rendering on the GPU
  • Parallelism
  • Data-oriented C++ design

We didn’t have all the burden of years of older implementations and could be very bold in the way we do things to achieve our performance goals.

High-level rendering architecture

Most web browsers split the rendering in 2 parts – painting and compositing. The whole page is split in “layers”. A layer is initiated by a DOM element (strictly a stacking context) that has certain styles. The rules differ in implementations, but usually things with 3D transforms, opacity < 1, etc. become layers. You can think of a layer as a texture (an image) that contains part of the web page content.

The layers are individually “painted” by either the GPU or CPU. The painting fills the text, images, effects and so on. After all the needed layers are painted, the whole scene is “composed”. The layers are positioned and the GPU draws them in the final render target which is displayed to the user.

Layers were introduced both as a convenience feature to simplify some effects and as a performance optimization. Often elements move around, but their content doesn’t change, so the browser can skip re-painting a layer whose content is static.

You can see the layers that Chrome produces by enabling from DevTools, Rendering -> Layer Borders.

Layers_chrome

Unfortunately layers have also severe downsides:

  • The implementation of composition is very complex and requires significant computational effort to keep correct. When an element is promoted to “layer”, the browser has to do a lot of calculations and track what other elements it intersects in order to preserve the proper draw order of elements. Otherwise you risk having elements that don’t properly respect the z-index when rendering.
  • Layers consume huge amounts of GPU memory. When you have multiple elements that are layers one-on-top of the other, you have multiple pixels for each “final” pixel that the user will see. The problem is especially bad in 4K, where a full-screen buffer is 32 MB. Some browsers try to reduce the amount of layers by “squashing” them at the expense of even more complex calculations.

We decided pretty early that layers were not something we want in LensVR – we wanted to conserve memory. This proved a big win as it simplifies significantly the painting code and there is no “composing” step.

Mozilla’s WebRender (used in Servo and Mozilla Quantum) has a similar design decision – they also have only 1 conceptual drawing step and no composing. Every other major OS browser uses layers as of the time of this post.

The risk without layer is having slower frames when large parts of the screen have to be re-painted.

Fortunately GPUs are extremely fast at doing just that. All rendering in Renoir happens exclusively on the GPU. The amount of rendering work that a web page generates is far below what a modern PC or mobile GPU can rasterize. The bottleneck in most web browsers is actually on the CPU side – the generation of commands for the GPU to execute.

Web pages tend to generate a lot of draw calls – if done naively you end up with hundreds of calls per-frame – for each text, image effect and so on. The performance results can be especially disastrous on mobile where draw calls are quite expensive.

Renoir implements numerous techniques to reduce the draw call count.

Dirty rectangle tracking

When the page changes due to an animation or another interactive event, usually a small part actually changes visually. We keep a collection of “dirty” rectangles where the page has potentially changed and that have to be re-drawn. Most OS browsers implement some degree of dirty rectangle tracking. Notably Mozilla’s WebRender differs – they re-draw the whole page each frame.

My profiling on typical workloads is that re-drawing only parts of the screen is still a big win both on the CPU and GPU side, even though more bookkeeping has to be done. The rationale is pretty simple, you do less work compared to re-drawing everything. The important part is keeping the dirty rect tracking quick. Elements that have to be drawn are culled against the current dirty rects and anything that doesn’t intersect is thrown out.

In Hummingbird we work as part of a game engine, so we strive for sub-millisecond draw times, far less than the 16.6ms per-frame that a general browser has to get, so dirty rects are hugely important. For LensVR, it’s a big win as well because we can quickly finish our work and get the CPU core back to sleep on mobile, which saves battery life!

In the screenshot below, only the highlighted rectangle will be re-drawn in Hummingbird/LensVR. A similar visualization is also available in Chrome under Rendering->Paint flashing.

dirty_rect.png

Rendering commands generation

From the styled and laid-out page we generate a “command buffer” – a list of high level rendering commands that will be later transformed in low level graphics API commands. The command buffer generation is kept very simple, the buffer is a linear area of memory, there are no dynamic allocations or OOP paradigms. All logical objects like images, effects etc. are simple handles (a number). Command generations happen in all browsers and this is an area of continuous improvement.

We kept Renoir a “thin” library, this is different from the approach taken in the Skia 2D rendering library used in Chrome & Mozilla. Skia is a very object-oriented library with complex object lifetimes, interactions and numerous levels of abstractions. We wanted to keep Renoir very lean, which helped us a lot during the optimization phases. Chromium’s “slimming paint” effort is a way to reduce the abstractions and quicken the “command generation” step.

Parallelism

All command generation and later rendering happen in parallel with the other operations in the webpage like DOM manipulations & JavaScript execution. Other browsers also try to get more work off the main thread by parallelizing composition and rasterization. LensVR/Hummingbird go a step further with their task-based architecture, which overlaps significantly computations and uses more CPU cores to finish the rendering faster. Most threads are not “locked” in doing only one specific job, but can do whatever is needed at the moment to get the frame ready as fast as possible. Still we’re looking to improve this area further as I see possibilities for even better hardware utilization.

In the next post

In part 2 I’ll explain how we utilize the GPU and share some performance comparisons I did between Renoir, Chrome’s rendering and WebRender in Servo. Stay tuned!

Compile-time check on task parameters (part 2)

In my previous post in the series I wrote how in our HTML rendering engine we try to avoid accidental errors when passing parameters to tasks. A task is a unit of work that might happen on a different thread. Specifically we want to avoid passing objects of types that are not designed to be kept in a task. The previous post highlights the rationale behind the design of the system, I’ll only discuss the implementation here.

The syntax for launching the task is the following:

The syntax is very close to a normal C++ lambda, slightly modified to perform the parameter validation, which is hidden behind the TASK_PARAMS macro. At compile-time it’ll walk all the variables passed to it and make sure that their types are explicitly allowed to be passed to a task. All other types will produce a meaningful compilation error.

The macro is expanded to a ValidateTaskParameters function that looks like this:

The function inspects all parameters through variadic template expansion and performs a series of compile-time checks of their types. The meta-programming templates are somewhat annoying, but worth the complexity in this case.

The first condition for instance says “if the type is a pointer and it wasn’t explicitly allowed to be passed as pointer – generate error”. We usually don’t allow naked pointers to be passed around in tasks, but if the developer knows what she is doing, she can force-allow it.

Marking types is done with template tagging hidden behind a macro for convenience.

Additional macros are available for marking other ways to share an object: by shared pointer, by weak pointer etc. There is also tag & macro that forbid passing a type to a task altogether.

To recap, our system forces developers to think carefully what types can be passed around in tasks and reduces the chances of accidental errors. The implementation warns at the earliest possible time – during compilation and has no run-time cost and a modest compilation-time cost.

Suggestions and comments are appreciated, please share if you have tackled a similar problem.

Web Tools for Application UI

Yesterday at GDC 17, Andreas Fredriksson from Insomniac gave a fascinating talk about their experience using web tools (HTML, JS) to create AAA game editing tools. You can see the slides here: https://deplinenoise.wordpress.com/2017/03/03/slides-insomniacs-web-tools-postmortem/.

I believe that using web tools for application UI development is a good solution. At Coherent Labs, along with our main middlewares, we develop the Coherent Editor, which is a game UI development tool written with web tech, like Insomniac’s.

I want to give a sum-up of our experience developing the tool and how it compares to Insomniacs’. The Editor is still in heavy development and we are happy how it’s shaping up and the development process we built around it.

The Coherent Editor in version 1.5.3.4 looks like this:

coherent_editor

The final output is a game UI in standard HTML/CSS that can be consumed in Coherent GT or Hummingbird. This particular sample will look like this in the game:

data_binding

The Coherent Editor is currently ~100K lines of JavaScript, it’s smaller than Insomniacs’ tools, but still a very large application by standard-JS-app sizes.

JavaScript gives us the same benefits that Andreas has highlighted + some more:

  • Super-quick iteration
  • We can leverage JS libraries
  • Exercises our own runtime – Coherent GT
  • Can output optimal HTML/CSS for our runtime
  • Directly interfaces with our products
  • Can have game-specific features

Finding great JavaScript developers seems easier than C++ devs. We leverage all C++ devs only for the development our runtimes.

Andreas highlights many issues they encountered while using the web stack in their development process, I’ll sum-up how we solve them in our team:

  • Don’t use Chrome/CEF/Firefox/Edge etc.

The Coherent Editor runs on-top of Coherent GT. Here we have the advantage that we create our own technology. 5 years ago when we started Coherent Labs, we based our first generation product on chromium (the project Chrome builds upon). We quickly realised this is not so good. Chrome is an amazing beast but does too much things, changes too often and breaks things all the time.

With our own technology we keep features stable and concentrate on what makes sense for games. Generally GT is an order of magnitude faster rendering-wise compared to Chrome. Chrome has a super complicated out-of-process architecture that is useless for games and hogs memory like no other.

Andreas notes that in the end they had to freeze the Chrome version they used and that a stable custom runtime would have been better. When we started the Editor there were debates if it should be based on Chrome/Firefox, but fortunately we didn’t go that road.

  • Extend communication JS <-> C++

We have a very easy layer of data binding in GT that allows communication and data-exchange between JavaScript and C++. All the OS-specific things is directly implemented in the Editor. An API is exported from C++ to JS and it can do everything it needs – loading/saving files, launching external tools, importing from Photoshop/Illustrator and so on. This removes a huge amount of complexity, when you need something written in native, you simply go, write it and use it from JS.

Performance-heavy tasks can also go to other threads in C++, we are not limited to the threads JS allows.

  • Test all the things

We also quickly realized that a lot of JS code becomes quickly very brittle. Fortunately JS is also easy to test. We added Selenium support to Coherent GT and now the Editor team writes test for all their features, which helped a lot for the stability and confidence in the tool.

  • Use declarative data binding and components

The best code is the one that does not exist. GT has a declarative data-binding layer that allows UI developers to NOT write any JS code and attach fields, properties etc. directly to the C++ data model. It can also instantiate UI elements (components) in a “for”-cycle way.

Think an “Open file” dialog, where the “model” is the list of files read from C++ and the dialog is directly populated with the “File” components in the UI. The “File” component is a small piece of HTML with and icon, text, properties etc. This approach saves a ton of time and reduces the risk of errors.

  • Typescript

The Coherent Editor is written in Typescript, which is also what Insomniac ended up doing. Specialized scripts run autonomous and recompile and redeploy the changed JS. It happens pretty quickly at least in our application and doesn’t seem to impact iteration time significantly. I guess it depends on the overall project structure though.

  • GC configuration

We have configs on the JS GC that make it run less often during heavy tasks and allows it to do it’s work when the Editor is relatively idle. It increases the peak memory usage, but eliminates hiccups.

Conclusions

Using web-only tech with a standard browser is very, very tough to get right for a large Application UI shell. What saved us is that we have our own technology that solves the downsides of a “pure” browser-based solution.

  • Performance was not a problem – GT is a game UI runtime designed for consoles so on development machines, the Coherent Editor UI flies.
  • Data-binding really makes a difference. We designed it to make it easy for games to communicate with the UI and in essence NOT write any JavaScript. Separating the UI development from the “backend” speeds up development x10.
  • If something doesn’t work as we like – we can change it. Although Chrome is OSS, making changes there is incredibly time-consuming and risky (they break it the day after). Believe me.. I’ve gone that path.

We still have a lot of work to do on the Coherent Editor. There are operations that can put it to it’s knees (try loading 4000 images and go grab a coffee) but they are mostly related to handling specific cases than to the overall architecture. We haven’t yet achieved the complexity of Insomniacs’ tools and I hope it’ll hold when we do.

There’s a lot of merit in using HTML/JS/CSS for large applications. I know of other companies that have used web tech for their tools, if you have a story to share, please do.