Temporary allocations in C++

Heap allocations in C++ are notoriously expensive when they hit the generic allocator. Many high-performance software projects employ custom allocation strategies that fit a scenario and speed up allocations. A great overview of allocation strategies common in games is given by Stefan Reinalter in his Molecular musings blog.

In this post I’ll specifically talk about temporary allocations and how to optimize STL allocations. STL is difficult to use right as its types do many heap allocations and the allocator is template parameter. It makes types with different allocators incompatible. C++11 and C++14 however have added many new features that in my opinion return STL as a viable option even for performance-critical code.

Temporary objects usually survive only a small scope within the execution of the program, but this scope (function) might be called many times. If objects in it always allocate on the heap, we might suffer a substantial performance hit.

Imagine this function:

void OnTaskComplete()
{
  std::vector tasksToSchedule;
  for(auto& task : m_WaitingTasks)
  {
    if (task->CanBeScheduled())
    {
      tasksToSchedule.push_back(task);
      // Do some more stuff
    }
  }
  for (auto& task : tasksToSchedule)
  {
    // Schedule task
  }
}

The function in the example might get called after a task in job system completes. It checks if to schedule some new task. It might get called hundreds of times in a frame. The vanilla vector there might impact performance as it will do a heap allocation each time at least one task will have to be scheduled.

We know that the lifetime of the object is really narrow – it is temporary to the scope of the function. Let’s make it a lot faster.

We will employ a custom allocator for temporary objects that outperforms the standard heap allocations by 7x-10x.

A “Linear allocator” is one that allocates a predefined chunk of memory and each time we request from it – it gives back a chunk from it. We don’t manage anything except where we currently are in the buffer. We can however “reset” the allocator back, which effectively allows us to reuse the memory. It is common in games to use such an allocator for all allocations within the frame and reset it at the beginning of a new one. This allows for very fast O(1) allocations within the frame and an O(1) super-quick bulk “deallocation”. We might be in trouble only if we consume more memory within the frame than the allocator has to offer. Assertions are in order.

class LinearAllocator
{
public:
  LinearAllocator(size_t size)
    : m_Ptr{ static_cast(TEMP_MALLOC(size)) }
    , m_TotalSize{ size }
    , m_FreeSpace{ size }
  {}

  ~LinearAllocator()
  {
    TEMP_FREE(m_Ptr);
  }

  void* Allocate(size_t size, unsigned alignment /* power of 2 */)
  {
    assert((alignment & (alignment - 1)) == 0);

    auto currentPtr = static_cast(m_Ptr + (m_TotalSize - m_FreeSpace));
    auto retPtr = std::align(alignment, size, currentPtr, m_FreeSpace);

    if (!retPtr)
    {
      assert(false && "Linear allocator full!");
      // no space
      return nullptr;
    }

    m_FreeSpace -= size;

    return retPtr;
  }

  void Free()
  {
    // do nothing
  }

  void Reset(size_t freeSpace)
  {
    m_FreeSpace = freeSpace;
  }

  size_t CurrentFreeSpace() const { return m_FreeSpace; }

private:
  char* m_Ptr;
  size_t m_TotalSize;
  size_t m_FreeSpace;
};

In the systems I work with, we have multiple threads that work on “tasks”. This makes defining a frame somewhat difficult. In this sample we will employ a scope object that in its destructor resets the linear allocator.

Our allocator will also be thread-local, so that every thread has its own copy.

struct TempAllocatorScope
{
public:
  TempAllocatorScope()
    : m_Space(tlsLinearAllocator->CurrentFreeSpace())
  {}

  ~TempAllocatorScope()
  {
    tlsLinearAllocator->Reset(m_Space);
  }
private:
  size_t m_Space;
};

Finally we will have an STD allocator that forwards all calls to our linear one and allows us to use it with any STL type.

The complete test example is available here.
The sample allows us to switch between the linear allocator and the default one to measure the gains. On my machine it was between 7x – 10x with the VS 2015 runtime.

Full code:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s