Rendering HTML at 1000 FPS – Part 2

Squeezing the GPU

The post is also available in the official LensVR blog.

This is the second blog post of the sequence in which I talk about the LensVR rendering engine.

In the first post, I discussed the high level architecture of the LensVR/Hummmingbird rendering. In this post I will get into the specifics of our implementation – how we use the GPU for all drawing. I will also share data on a performance comparison I did with Chrome and Servo Nightly.

Rendering to the GPU

After we have our list of high-level rendering commands, we have to execute them on the GPU. Renoir transforms these commands to low-level graphics API calls (like OpenGL, DirectX etc.). For instance a “FillRectangle” high level command will become a series of setting vertex/index buffers, shaders and draw calls.

Renoir “sees” all commands that will happen in the frame and can do high-level decisions to optimize for three important constraints:

  • Minimize GPU memory usage
  • Minimize synchronization with the GPU
  • Minimize rendering API calls

When drawing a web page, certain effects require the use of intermediate render targets. Renoir will group as much as possible of those effects in the same target to minimize the memory used and reduce changing render targets for the GPU, which is fairly slow operation. It’ll also aggressively cache textures and targets and try to re-use them to avoid continually creating/destroying resources, which is quite slow.

The rendering API commands are immediately sent to the GPU on the rendering thread, there is no “intermediate” commands list as opposed to  Chrome, where a dedicated GPU process is in charge of interacting with the GPU. The “path-to-pixel” is significantly shorter in LensVR compared to all other web browsers, with a lot less abstraction layers in-between, which is one of the keys to the speed it gets.

Rendering in cohtml.png

GPU execution

The GPU works as a consumer of commands and memory buffers generated by the CPU, it can complete its work several frames after that work has been submitted. So what happens when the CPU tries to modify some memory (say a vertex or index buffer) that the GPU hasn’t processed yet?

Graphics drivers keep tracks of these situations, called “hazards” and either stall the CPU until the resource has been consumed or do a process called “renaming” – basically cloning under the hood the resource and letting the CPU modify the fresh copy. Most of the time the driver will do renaming, but if excessive memory is used, it can also stall.

Both possibilities are not great. Resource renaming increases the CPU time required to complete the API calls because of the bookkeeping involved, while a stall will almost certainly introduce serious jank in the page. Newer graphics APIs such as Metal, Vulkan and DirectX 12 let the developer keep track and avoid hazards. Renoir was likewise designed to manually track the usage of it’s resource to prevent renaming and stalls. Thus, it fits perfectly the architecture of the new modern APIs. Renoir has native rendering backend API implementations for all major graphics APIs and uses the best one for the platform it is running on. For instance on Windows it directly uses DirectX 11. In comparison, Chrome has to go through an additional abstraction library called ANGLE, which generates DirectX API calls from OpenGL ones – Chrome (which uses Skia) only understands OpenGL at the time of this post.

Command Batching

Renoir tries very hard to reduce the amount of rendering API calls. The process is called “batching” – combining multiple drawn elements in one draw call.

Most elements in Renoir can be drawn with one shader, which makes them much easier to batch together. A classic way of doing batching in games is combining opaque elements together and relying on the depth buffer to draw them correctly in the final image.

Unfortunately, this is much less effective in modern web pages. A lot of elements have transparency or blending effects and they need to be applied in the z-order of the page, otherwise the final picture will be wrong.

Renoir keeps track of how elements are positioned in the page and if they don’t intersect it  batches them together and in that case the z-order no longer breaks batching. The final result is a substantial reduction of draw calls. It also pre-compiles all the required shaders in the library, which significantly improves “first-use” performance. Other 2D libraries like Skia rely on run-time shader compilation which can be very slow (in the seconds on first time use) on mobile and introduce stalls.

Results & Comparisons

For a performance comparison I took  a page that is used as an example in the WebRender post. I did a small modification, substituting the gradient effect with an “opacity” effect, which is more common and is a good stress test for every web rendering library. I also changed the layout to flex-box, because it’s very common in modern web design. Here is how it looks:

Website gif

Link to page here.

All tests were performed on Windows 10 on a i7-6700HQ @ 2.6GHz, 16GB RAM, NVIDIA GeForce GTX 960M, and on 1080p. I measured only the rendering part in the browsers, using Chrome 61.0.3163.100 (stable) with GPU raster ON, Servo nightly from 14 Oct 2017, and LensVR alpha 0.6.

Chrome version 61.0.3163.100 results

The page definitely takes a toll on Chrome’s rendering, it struggles to maintain 60 FPS, but is significantly faster than the one in the video. The reasons are probably additional improvements in their code and the fact that the laptop I’m testing is significantly more powerful than the machine used in the original Mozilla video.

Let’s look at the performance chart:


I’m not sure why, but the rasterization always happens on one thread. Both raster and GPU tasks are quite heavy and a bottleneck in the page – they dominate the time needed to finish one frame.

On average for “painting” tasks I get ~5.3ms on the main thread with large spikes of 10+ms, ~20ms on raster tasks and ~20ms on the GPU process. Raster and GPU tasks seem to “flow” between frames and to dominate the frame-time.

Servo nightly (14 Oct 2017) results

Servo fares significantly better rendering-wise, unfortunately there are some visual artifacts. I think it’s Servo’s port for Windows, that is still a bit shaky.

You can notice that Servo doesn’t achieve 60 FPS as well, but that seems to be due to the flex-box layout, we ignore that and look only at the rendering however. The rendering part is measured as “Backend CPU time” by WebRender at ~6.36ms.

Servo GPU

LensVR alpha 0.6

LensVR Rendering

Here is one performance frame zoomed inside Chrome’s profiling UI which LensVR uses for it’s profiling as well.

The rendering-related tasks are the “Paint” one on-top, which interprets the Renoir commands, performs batching and executes the graphics API calls and the “RecordRendering” on the far right, which actually walks the DOM elements and generates Renior commands.

The sum of both on average is ~2.6ms.


The following graphic shows the “linearized” time for all rendering-related work in a browser. While parallelism will shorted time-to-frame, the overall linear time is a good indicator on battery life impact.


Both WebRender and Renoir with their novel approaches to rendering have a clear advantage. LensVR is faster compared to WebRender, probably because of a better command generation and API interaction code. I plan to do a deeper analysis in a follow-up post.



Rendering HTML at 1000 FPS – Part 1

This was originally posted on the LensVR blog.

Part 2 is also available here.

Web page rendering is one of the most interesting and active development areas in computer graphics. There are multiple approaches with pros and cons. In the post I’ll go into details about how we do HTML rendering in Coherent Labs’ Hummingbird and LensVR browser and how it compares to Chrome and Mozilla’s WebRender.

I’ll split the post in two parts, this first one is dedicated to the high level architecture and how we decide what to render. The second part – “Squeezing the GPU” will be about how these decisions get implemented to use the GPU for all drawing and will give some performance results I measured.

The renderer described is still experimental for general web pages, but is very successfully deployed in game user interfaces across PC, mobile and consoles. The constraints of these platforms led us to somewhat different design decisions compared to what the folks at Chrome and Mozilla do. Now we are applying this approach to more web pages in LensVR and, feedback is most welcome.

Recently in an awesome post about Mozilla’s WebRender, Lin Clark explained not only how WebRender works, but also gives a great overview of the way rendering happens in most web browsers. I advise everybody who is interested in how browsers work to read her post.

To quickly recap I’ll concentrate on what we internally call rendering of the web page. After the Layout engine has positioned all DOM elements on the page and their styles have been calculated, we have to generate an image that the user will actually see.DOMandRendering.jpg

The rendering is implemented through the Renoir library. Renoir is a 2D rendering library that has all the features required to draw HTML5 content. It is conceptually similar in its role to Mozilla WebRender and Skia (used in Chrome and Firefox before Quantum).

When designing Renoir, performance was our primary goal and we built it around three major paradigms:

  • All rendering on the GPU
  • Parallelism
  • Data-oriented C++ design

We didn’t have all the burden of years of older implementations and could be very bold in the way we do things to achieve our performance goals.

High-level rendering architecture

Most web browsers split the rendering in 2 parts – painting and compositing. The whole page is split in “layers”. A layer is initiated by a DOM element (strictly a stacking context) that has certain styles. The rules differ in implementations, but usually things with 3D transforms, opacity < 1, etc. become layers. You can think of a layer as a texture (an image) that contains part of the web page content.

The layers are individually “painted” by either the GPU or CPU. The painting fills the text, images, effects and so on. After all the needed layers are painted, the whole scene is “composed”. The layers are positioned and the GPU draws them in the final render target which is displayed to the user.

Layers were introduced both as a convenience feature to simplify some effects and as a performance optimization. Often elements move around, but their content doesn’t change, so the browser can skip re-painting a layer whose content is static.

You can see the layers that Chrome produces by enabling from DevTools, Rendering -> Layer Borders.


Unfortunately layers have also severe downsides:

  • The implementation of composition is very complex and requires significant computational effort to keep correct. When an element is promoted to “layer”, the browser has to do a lot of calculations and track what other elements it intersects in order to preserve the proper draw order of elements. Otherwise you risk having elements that don’t properly respect the z-index when rendering.
  • Layers consume huge amounts of GPU memory. When you have multiple elements that are layers one-on-top of the other, you have multiple pixels for each “final” pixel that the user will see. The problem is especially bad in 4K, where a full-screen buffer is 32 MB. Some browsers try to reduce the amount of layers by “squashing” them at the expense of even more complex calculations.

We decided pretty early that layers were not something we want in LensVR – we wanted to conserve memory. This proved a big win as it simplifies significantly the painting code and there is no “composing” step.

Mozilla’s WebRender (used in Servo and Mozilla Quantum) has a similar design decision – they also have only 1 conceptual drawing step and no composing. Every other major OS browser uses layers as of the time of this post.

The risk without layer is having slower frames when large parts of the screen have to be re-painted.

Fortunately GPUs are extremely fast at doing just that. All rendering in Renoir happens exclusively on the GPU. The amount of rendering work that a web page generates is far below what a modern PC or mobile GPU can rasterize. The bottleneck in most web browsers is actually on the CPU side – the generation of commands for the GPU to execute.

Web pages tend to generate a lot of draw calls – if done naively you end up with hundreds of calls per-frame – for each text, image effect and so on. The performance results can be especially disastrous on mobile where draw calls are quite expensive.

Renoir implements numerous techniques to reduce the draw call count.

Dirty rectangle tracking

When the page changes due to an animation or another interactive event, usually a small part actually changes visually. We keep a collection of “dirty” rectangles where the page has potentially changed and that have to be re-drawn. Most OS browsers implement some degree of dirty rectangle tracking. Notably Mozilla’s WebRender differs – they re-draw the whole page each frame.

My profiling on typical workloads is that re-drawing only parts of the screen is still a big win both on the CPU and GPU side, even though more bookkeeping has to be done. The rationale is pretty simple, you do less work compared to re-drawing everything. The important part is keeping the dirty rect tracking quick. Elements that have to be drawn are culled against the current dirty rects and anything that doesn’t intersect is thrown out.

In Hummingbird we work as part of a game engine, so we strive for sub-millisecond draw times, far less than the 16.6ms per-frame that a general browser has to get, so dirty rects are hugely important. For LensVR, it’s a big win as well because we can quickly finish our work and get the CPU core back to sleep on mobile, which saves battery life!

In the screenshot below, only the highlighted rectangle will be re-drawn in Hummingbird/LensVR. A similar visualization is also available in Chrome under Rendering->Paint flashing.


Rendering commands generation

From the styled and laid-out page we generate a “command buffer” – a list of high level rendering commands that will be later transformed in low level graphics API commands. The command buffer generation is kept very simple, the buffer is a linear area of memory, there are no dynamic allocations or OOP paradigms. All logical objects like images, effects etc. are simple handles (a number). Command generations happen in all browsers and this is an area of continuous improvement.

We kept Renoir a “thin” library, this is different from the approach taken in the Skia 2D rendering library used in Chrome & Mozilla. Skia is a very object-oriented library with complex object lifetimes, interactions and numerous levels of abstractions. We wanted to keep Renoir very lean, which helped us a lot during the optimization phases. Chromium’s “slimming paint” effort is a way to reduce the abstractions and quicken the “command generation” step.


All command generation and later rendering happen in parallel with the other operations in the webpage like DOM manipulations & JavaScript execution. Other browsers also try to get more work off the main thread by parallelizing composition and rasterization. LensVR/Hummingbird go a step further with their task-based architecture, which overlaps significantly computations and uses more CPU cores to finish the rendering faster. Most threads are not “locked” in doing only one specific job, but can do whatever is needed at the moment to get the frame ready as fast as possible. Still we’re looking to improve this area further as I see possibilities for even better hardware utilization.

In the next post

In part 2 I’ll explain how we utilize the GPU and share some performance comparisons I did between Renoir, Chrome’s rendering and WebRender in Servo. Stay tuned!

The Voxels library

Some days ago I finally released the first public alpha of my Voxels library project. For quite some time I’ve been interested in volume rendering for real-time applications. I believe it has a lot of future applications.

The library is the fruit of some of my work in that field but alas I do on it only in my extremely limited spare time as I’m concentrated on my company – Coherent Labs. The main ideas behind the library have already been highlighted in my previous posts and the talk I gave on Chaos groups’ CG2 seminar in October 2013. Preparing the library for release has been a much longer process than I expected but the Windows version at least is finally downloadable from github here, along with a very detailed sample here.

A hand-sculpted surface started as a ball. Voxels supports multiple materials and normal mapping

A hand-sculpted surface started as a ball. Voxels supports multiple materials and normal mapping

Some internal detail

The polygonization algorithm used is TransVoxel. I chose it because it is very fast, proven correct and it’s easy to parallelize. All Eric Lengyel’s Ph.D. thesis on the subject is very interesting and I recommend it to anyone interested not only in volume rendering but in real-time rendering in general. The algorithm addresses one of the most important issues with volume rendering techniques – the need for LOD. Currently the meshes produced by the library are very “organic” in their shape (due to the Marching Cubes roots of the algorithm) and are best suited for terrains and other earth-like surfaces.

My implementation produces correct meshes relatively fast, scales extremely well and tries to keep the memory usage low. Currently I’m using simple RLE compression on the grid which works surprisingly well giving very fast run times and large compression rations 30x+. Lengyel asserts using it in his implementation too with satisfactory results.

The polygonization process is multi-threaded and uses all the available machine cores. Here there is much room for API improvement to set limits on the resources used and eventually the exact cores employed.

In the sample application I’ve also added an implementation for an octree LOD class that culls blocks of the mesh and more importantly decides which levels to draw on which LOD level and when to draw the transitions (the transitions are meshes that fill the gaps between adjacent block of different LOD levels).


I intend to continue the active development of the library. Currently the focus will be adding support for Linux and may be Mac OS X and improving the polygonization speed even further – especially when editing the voxel grid. The API also needs some polishing – I’m currently working on an integration of the library with an open-source engine and see some issues with it.

I’d also like to update the sample or create a new one that draws all the current state of the mesh in one draw call through some of the indirect rendering APIs.

Feedback is extremely appreciated. If you find the library interesting and would like to use it for something or have any suggestions/ideas – drop me a line.

Rendering experiments framework

Framework available on

I dedicate most of my professional time and 99% of my spare programming time to real-time graphics. Some years ago I started a small framework that I use on a daily basis for all the graphics experiments and demos I do.

Today I open-source this framework in the hopes that it might help somebody else in fast-prototyping something.

General notes

  • The framework is entirely geared towards fast prototyping of graphics techniques and algorithms. The current version was started at least 3-4 years ago and grew organically. Some parts are ancient and taken from previous file collections I used before for prototypes.
  • It is NOT a game engine, it is NOT a full graphics engine, it shouldn’t be used in production.
  • It doesn’t abstract anything related to graphics to leave as much room to experimentation as possible.
  • It is Windows, DirectX 11 only.


The sole goal of the framework is to quickly prototype ideas and algorithms for real-time (usually game) rendering. The framework is currently divided in 4 static libraries:


Contains a base application class that takes care of windows creation, input forwarding, message loop etc. It’s pretty minimal and graphics back-end agnostic.


Contains the classes that initialize an Application with a renderer. Currently only a Dx11 rendering app can be created.


Contains all the graphics stuff. Everything is tightly DX11 bound except the loaders/savers.

  • The DxRenderer class that holds the DX11 device and the immediate context. It creates the default back-buffer and depth-stencil buffer. It also contains a list of all the rendering routines that will execute in turn every frame.
  • DxRenderingRoutine is an abstract class that allows specifying rendering passes. Most of the prototypes I’ve created with the framework are in essence a bunch of inheritors from this class. The routines are registered with the DxRenderer and called in turn each frame.
  • A Camera class for looking around
  • Mesh and Subset classes. A mesh is a Vertex Buffer and a collection of subsets. Every subset han an Index buffer, a material and an OOBB and a AABB.
  • Texture manager – a simple convenience class for loading, creating and managing textures with different formats.
  • Shader manager – a class for compiling and creating shaders from files. It also contains wrappers for easier creation of constant buffers.
  • Material shader manager – can inject in the shader information about the type of material that will be drawn. It inserts in the shader code “static bool” variables depending on the properties of the selected material that can be used for static branching later in the shader code. It also contains a map between a compiled shader for a specific material so that we can easily reuse them.
  • ScreenQuad – simple class for drawing a full-screen quad
  • FrustumCuller – culls subsets provided a view and projection matrix
  • DepthCuller – an unfinished class for software occlusion culling
  • SoftwareRasterizer – an unfinished and super experimental software rasterizer. I think it can currently just draw a couple of triangles.
  • OBJ file loader. Supports most of the OBJ and MTL format. I almost exclusively use the Sponza mesh for testing, so everything used in it is supported.
  • Raw file loader. “Raw” files are just memory dumps with vertex, index data and information about the materials and used textures
  • Raw file saver – saves a raw mesh.


  • Logging – a multi-threaded, fast, easy to use logging system with custom severities, facility information and unlimited arguments for the log message.
  • Alignment – base classes and allocators for aligned types
  • Assertions – custom asserts
  • MathInlines – a couple of math functions to deal with a VS bug explained here.
  • Random number generator
  • STL allocators supporting a custom allocator
  • Some smart pointers for COM interfaces (all D3D objects)

That’s pretty much it. I plan to open-source shortly also some of my demos/experiments so a concrete usage of the framework will be shown there.

Usage & Dependencies

The framework depends on Boost(1.55+) and requires Visual Studio 2012+. To set-up the library you need to configure the property sheets it uses to find it’s dependencies.

The Property sheets are located in the “Utilities” folder and are named “PathProperty_template.props”, “Libraries_x86_template.props”, “Libraries_x64_template.props”. You must re-name them to “PathProperty.props”, “Libraries_x86.props”, “Libraries_x64.props” and edit them so that they point to your local Boost build. The “PathProperty.props” is designed to set the include paths while the other two to the link libraries for x86 and x64.

Contains glm as a third-party dependency committed in the repo. The framework itself doesn’t use it but it’s widely used in some of the demos I made, so it’s here.

A sample made with the framework - light-pre-pass, motion blur, FXAA

A sample made with the framework – light-pre-pass, motion blur, FXAA


I will continue to use the libs for my Dx11 experiments in the future so I’ll update it when I need something else or find an issue. I don’t plan to abstract it enough to support OpenGL or other OSes different than Windows.

That said, I already need another framework for easy prototyping OpenGL stuff and Linux testing, so a new “OGL” version will probably be born when I have some more time to dedicate it.


I’m licensing the framework under the “New BSD License”, so you can pretty much do whatever you want with it. If you happen to use something, credit is always appreciated.

Feedback welcome.

Framework available on

Practical Volume Rendering for realtime applications – presentation

In October 2013 I gave a talk on Chaos Group’s CG2 seminar.

Now I share the English version slides for it. The talk briefly introduces the state of Volume rendering until now and the potential I see it has. We are seeing increasingly many uses of Volume rendering for games and many new applications are still emerging. The second part of the slides gives details about the TransVoxel algorithm by Eric Lengyel, my implementation of it and the lessons I learned from that. Some of the blog posts I’ve written are also based on the same research I’m doing.

Soon I plan to publish a C++ library for Volume rendering for games that encompasses everything highlighted in the slides and many other improvements.

Overview of modern volume rendering techniques for games – Part II

This post has been published also in Coherent Labs’s blog – the company I co-founded and work for.

In this blog series I write about some modern volume rendering techniques for real-time applications and why I believe their importance will grow in the future.

If you have not read part one of the series please check it out here, it is an introduction to the topic and overview of volume rendering techniques. Check it out if you haven’t already and then go on.

In this second post from our multi-post series on volume rendering for games, I’ll explain the technical basics that most solutions share. Through all the series I’ll concentrate on ‘realistic’, smooth rendering – not the ‘blocky’ one you can see in games like Minecraft.

Types of techniques

Volume rendering techniques can be divided in two main categories – direct and indirect.

Direct techniques produce a 2D image from the volume representation of the scene. Almost all modern algorithms use some variation of ray-casting and do their calculations on the GPU. You can read more on the subject in the papers/techniques “Efficient Sparse Voxel Octrees” and “Gigavoxels”.

Although direct techniques produce great looking images, they have some drawbacks that hinder their wide usage in games:

  1. Relatively high per-frame cost. The calculations rely heavily on compute shaders and while modern GPUs have great performance with them, they are still effectively designed to draw triangles.

  2. Difficulty to mix with other meshes. For some parts of the virtual world we might still want to use regular triangle meshes. The tools developed for editing them are well-known to artists and moving them to a voxel representation may be prohibitively difficult.

  3. Interop with other systems is difficult. Most physics systems for instance require triangle representations of the meshes.

Indirect techniques on the other hand generate a transitory representation of the mesh. Effectively they create a triangle mesh from the volume. Moving to a more familiar triangle mesh has many benefits.

The polygonization (the transformation from voxels to triangles) can be done only once – on game/level load. After that on every frame the triangle mesh is rendered. GPUs are designed to work well with triangles so we expect better per-frame performance. We also don’t need to do radical changes to our engine or third-party libraries because they probably work with triangles anyway.

In all the posts in this series I’ll talk about indirect volume rendering techniques – both the polygonization process and the way we can effectively use the created mesh and render it fast – even if it’s huge.

What is a voxel?

A voxel is the building block of our volume surface. The name ‘voxel’ comes from ‘volume element’ and is the 3D counterpart of the more familiar pixel. Every voxel has a position in 3D space and some properties attached to it. Although we can have any property we’d like, all the algorithms we’ll discuss require at least a scalar value that describes the surface. In games we are mostly interested in rendering the surface of an object and not its internals – this gives us some room for optimizations. More technically speaking we want to extract an isosurface from a scalar field (our voxels).

The set of voxels that will generate our mesh is usually parallelepipedal in shape and is called a ‘voxel grid’. If we employ a voxel grid the positions of the voxels in it are implicit.

In every voxel, the scalar we set is usually the value of the distance function at the point in space the voxel is located. The distance function is in the form f(x, y, z) = dxyz where dxyz is the shortest distance from the point x, y, z in space to the surface. If the voxel is “in” the mesh, than the value is negative.

If you imagine a ball as the mesh in our voxel grid, all voxels “in” the ball will have negative values, all voxels outside the ball positive, and all voxels that are exactly on the surface will have a value of 0.

Cube polygonized with a MC-based algorithm – notice the loss of detail on the edge

Cube polygonized with a MC-based algorithm – notice the loss of detail on the edge

Marching cubes

The simplest and most widely known polygonization algorithm is called ‘Marching cubes’. There are many techniques that give better results than it, but its simplicity and elegance are still well worth looking at. Marching cubes is also the base of many more advanced algorithms and will give us a frame in which we can more easily compare them.

The main idea is to take 8 voxels at a time that form the eight corners of an imaginary cube. We work with each cube independently from all others and generate triangles in it – hence we “march” on the grid.

To decide what exactly we have to generate, we use just the signs of the voxels on the corners and form one of 256 cases (there are 2^8 possible cases). A precomputed table of those cases tells us which vertices to generate, where and how to combine them in triangles.

The vertices are always generated on the edges of the cube and their exact position is computed by interpolating the values in the voxels on the corners of that edge.

I’ll not go into the details of the implementation – it is pretty simple and widely available on the Internet, but I want to underline some points that are valid for most of the MC-based algorithms.

  1. The algorithm expects a smooth surface. Vertices are never create inside a cube but always on the edges. If a sharp feature happens to be inside a cube (very likely) than it will be smoothed out. This makes the algorithm good for meshes with more organic forms – like terrain, but unsuitable for surface with sharp edges like buildings. To produce a sufficiently sharp feature you’d need a very high resolution voxel grid which is usually unfeasible.

  2. The algorithm is fast. The very difficult calculation of what triangles should be generated in which case is pre-computed in a table. The operations on each cube itself are very simple.

  3. The algorithm is easily parallelizable. Each cube is independent of the others and can be calculated in parallel. The algorithm is in the family “embarrassingly parallel”.

After marching all the cubes, the mesh is composed of all the generated triangles.

Marching cubes tends to generate many tiny triangles. This can quickly become a problem if we have large meshes.

If you plan to use it in production, beware that it doesn’t always produce ‘watertight’ meshes – there are configurations that will generate holes. This is pretty unpleasant and is fixed by later algorithms.

In the next series I’ll discuss what are the requirements of a good volume rendering implementation for a game in terms of polygonization speed, rendering performance and I’ll look into ways to achieve them with more advanced techniques.


Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre, Elmar Eisemann. 2009. GigaVoxels : Ray-Guided Streaming for Efficient and Detailed Voxel Rendering.

Samuli Laine, Tero Karras. 2010. Efficient Sparse Voxel Octrees.

Paul Bourke, 1994, Polygonising a scalar field

Marching cubes on Wikipedia.


I gave a talk entitled “Practical Volume Rendering for real-time applications” at Chaos Group‘s annual CG2 conference in Sofia.

Available here in Bulgarian:

Overview of modern volume rendering techniques for games – Part I

This post has been published also in Coherent Labs’s blog – the company I co-founded and work for.

A couple of months ago Sony revealed their upcoming MMO title “EverQuest Next”. What made me really excited about it was their decision to base their world on a volume representation. This enables them to show amazing videos like this one. I’ve been very interested in volume rendering for a lot of time and in this blog series I’d like to point at the techniques that are most suitable for games today and in the near future.

In a series I’ll explain the details of some of the algorithms as well as their practical implementations.

This first post introduces the concept of volume rendering and what are it’s greatest benefits for games.

Volume rendering is a well known family of algorithms that allow to project a set of 3D samples onto a 2D image. It is used extensively in a wide range of fields as medical imaging (MRI, CRT visualization), industry, biology, geophysics etc. It’s usage in games however is relatively modest with some interesting use cases in games like Delta Force, Outcast, C&C Tiberian Sun and others. The usage of volume rendering faded until recently, when we saw an increase in it’s popularity and a sort of “rediscovery”.

A voxel-based scene with complex geometry

A voxel-based scene with complex geometry

In games we usually are interested just in the surface of a mesh – it’s internal composition is seldom of interest – in contrast to medical applications. Relatively few applications selected volume rendering in place of the usual polygon-based mesh representations. Volumes however have two characteristics that are becoming increasingly important for modern games – destructibility and procedural generation.

Games like Minecraft have shown that players are very much engaged by the possibility of creating their own worlds and shaping them the way they want. On the other hand, titles like Red Faction place an emphasis on the destruction of the surrounding environment. Both these games, although very different, have essentially the same technology requirement.

Destructibility (and of course constructability) is a property that game designers are actively seeking.

One way to achieve modifications of the meshes is to apply it to the traditional polygonal models. This proved to be a quite complicated matter. Middleware solutions like NVIDIA Apex solve the polygon mesh destructibility, but usually still require input from a designer and the construction part remains largely unsolved.

Minecraft unleashed the creativity of users

Minecraft unleashed the creativity of users

Volume rendering can help a lot here. The representation of the mesh is a much more natural 3D grid of volume elements (voxels) than a collection of triangles. The volume already contains the important information about the shape of the object and it’s modification is close to what happens in the real world. We either add or subtract volumes from one another. Many artists already work in a similar way in tools like Zbrush.

Voxels themselves can contain any data we like, but usually they define a distance field – that means that every voxel encodes a value indicating how far we are from the surface of the mesh. Material information is also embedded in the voxel. With such a definition, constructive solid geometry (CSG) operations on voxel grids become trivial. We can freely add or subtract any volume we’d like from our mesh. This brings a tremendous amount of flexibility to the modelling process.

Procedural generation is another important feature that has many advantages. First and foremost it can save a lot of human effort and time. Level designers can generate a terrain procedurally and then just fine-tune it instead of having to start from absolute zero and work out every tedious detail. This save is especially relevant when very large environments have to be created – like in MMORPG games. With the new generation of consoles with more memory and power, players will demand much more and better content. Only with the use of procedural generation of content, the creators of virtual worlds will be able to achieve the needed variety for future games.

In short, procedural generation means that we create the mesh from a mathematical function that has relatively few input parameters. No sculpting is required by an artist at least for the first raw version of the model.

Developers can also achieve high compression ratios and save a lot of download resources and disk space by using procedural content generation. The surface is represented implicitly, with functions and coefficients, instead of heightmaps or 3D voxel grids (2 popular methods for surface representations used in games). We already see huge savings from procedurally generated textures – why shouldn’t the same apply for 3D meshes?

The use of volume rendering is not restricted to the meshes. Today we see some other uses too. Some of them include:

Global illumination (see the great work in Unreal Engine 4)

Fluid simulation

GPGPU ray-marching for visual effects

In the next posts in the series I’ll give a list and details on modern volume rendering algorithms that I believe have the greatest potential to be used in current and near-future games.


I gave a talk entitled “Practical Volume Rendering for real-time applications” at Chaos Group‘s annual CG2 conference in Sofia.

Available here in Bulgarian: