Skip to main content

Stack allocated containers

No matter what fancy allocator you come up with, memory allocation will always be expensive. In order to reduce memory allocations I have been using stack-allocated containers for the past four-five years and I think it has worked really well, so I thought I'd write a post about them here. These methods have been used in all our games (Sprinkle, Granny Smith, Smash Hit as well as our new project).

Using containers is really convenient and often necessary. Take the array for example:

MyArray<Object> array;

Everybody's got one. The problem with these are that you need to allocate memory when putting stuff in them. In many cases you know on beforehand approximately how many objects you want to put in there, reducing it to a single allocation:

MyArray<Object> array;
array.reserve(50);

But if this code is in a function that is called frequently it would be much nicer if those first fifty objects were reserved on the stack instead of the heap, like this:

MyArray<Object, 50> array;

Now, the array object would have built-in storage for fifty objects and still be able to grow beyond that using regular heap allocations. Great! Or? What if you want to pass such an array by refence to a method? Say, a string splitting method:

void splitString(const String& del, MyArray<Object, 50>& result);

The obviously problem is that the output array now has to be stack allocated for exactly fifty objects. If we try to pass another array it won't compile, because C++ requires arguments with matching template arguments:

MyArray<Object, 8> result;
String str="this is a test";
str.split(" ", result);

Compiler error!

This effectively kills the entire charm of using template arguments for the array size. So how can we design a container class that can be passed around by reference while still offering stack allocation with a template argument?

What I've done is to create a base class, without stack allocation that can be used as a regular array:

template<class T>
class MyArray
{
    ...
    void* mData;
};

And then a subclass for the stack-allocated version:

template<class T, int size>
class MyArrayStack : public MyArray
{
    MyArrayStack()
    {
        mData = mStackStorage;
    }
    ...
    char mStackStorage[size*sizeof(T)];
};

The problem now becomes resizing the array at the base class, because the base class won't know wether the data storage is heap or stack allocated, and we don't want to have virtual methods and a vtable for such a lightweight class. Therefore, the base class needs to be aware of the subclass and avoid freeing memory when it is stack-allocated.

Fortunately this is not very complicated, since the stack allocation will always be placed immediately after the array object itself in memory, so we can just look at the pointer when resizing. If the storage pointer is right after the object itself in memory we have a stack allocation:

void resize(int newSize)
{
    ... allocate heap memory and copy over contents from mData
    if (mData != ((char*)this)+sizeof(MyArray))
        free(mData);  
}

Now we can modify the string splitting method to accept the base class:

void splitString(const String& delimiter, MyArray& result);

This will enable us to pass any stack allocated size array we want:

MyArrayStack<String, 8> result;
str.splitString(" ", result);

or

MyArrayStack<String, 128> result;
str.splitString(" ", result);

Compiler happy!

So the next big question would be – is this safe? There is still a small (well microscopic) chance that a non-stack allocated array object lines up right at the end of the last stack frame, and at the next memory adress is our heap-allocated storage data for it. In practice though this won't happen, because in standard memory allocators there is a header before the allocated data. Even if you use your own allocator without a header, this won't happen, because the memory area it is using would still need to be allocated by the OS, which will put a header in front of it.

I'm currently using this for arrays, sets, hash tables as well as memory streams, memory buffers and a few other things. It has been incredibly useful and probably one of the most important optimizations I have ever added to my framework. This effectively avoids memory allocations altogether without the hassle of using the typical "maxSize" for method arguments. I'm currently not using it for string objects (which all have a fixed stack allocated storage that depends on the project), but I'm kind of tempted to refactor that.

Comments

  1. How about move semantics? Do you deep copy in that case or disallow moving all together?

    You also need to be carfull with alignment for stack array. Do you use the new std::aligned_storage<> for this?

    ReplyDelete
    Replies
    1. Stack memory would be properly aligned if we change mStackStorage to be an array of type T rather than char. No need for STL.

      Delete
  2. Great article. Any chance for more complete example of the code ?

    ReplyDelete
    Replies
    1. Check out LLVM's SmallVector:

      http://llvm.org/docs/doxygen/html/SmallVector_8h_source.html

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. > ...chance that a non-stack allocated array object lines up right at the end of the last stack frame...

    Actually, I don't believe it's possible, ever. Stack grows backwards, from higher addresses to lower. The pointer in your case precedes the stack-allocated buffer, so the pointer would have to be out of stack space first, which would cause an immediate fault. And your object may not be the first thing allocated on the stack because there's at least the stack frame of the thread bootstrap (your thread function needs to return somewhere, right). So you're safe on that front.

    I'm personally a bit wary of allocating on stack directly though, I like to have a separate stack allocator. It gives you finer control over stack size (no need to change the main thread stack size, but annoying in multithreaded environments without fast TLS)

    ReplyDelete

Post a Comment

Popular posts from this blog

Bokeh depth of field in a single pass

When I implemented bokeh depth of field I stumbled upon a neat blending trick almost by accident. In my opinion, the quality of depth of field is more related to how objects of different depths blend together, rather than the blur itself. Sure, bokeh is nicer than gaussian, but if the blending is off the whole thing falls flat. There seems to be many different approaches to this out there, most of them requiring multiple passes and sometimes separation of what's behind and in front of the focal plane. I experimented a bit and stumbled upon a nice trick, almost by accident.

I'm not going to get into technical details about lenses, circle of confusion, etc. It has been described very well many times before, so I'm just going to assume you know the basics. I can try to summarize what we want to do in one sentence – render each pixel as a discs where the radius is determined by how out of focus it is, also taking depth into consideration "somehow".

Taking depth into…

Screen Space Path Tracing – Diffuse

The last few posts has been about my new screen space renderer. Apart from a few details I haven't really described how it works, so here we go. I split up the entire pipeline into diffuse and specular light. This post will focusing on diffuse light, which is the hard part.

My method is very similar to SSAO, but instead of doing a number of samples on the hemisphere at a fixed distance, I raymarch every sample against the depth buffer. Note that the depth buffer is not a regular, single value depth buffer, but each pixel contains front and back face depth for the first and second layer of geometry, as described in this post.

The increment for each step is not view dependant, but fixed in world space, otherwise shadows would move with the camera. I start with a small step and then increase the step exponentially until I reach a maximum distance, at which the ray is considered a miss. Needless to say, raymarching multiple samples for every pixel is very costly, and this is without …

A better depth buffer for raymarching

When doing any type of raymarching over a depth buffer, it is very easy to determine if there is no occluder – the depth in the buffer is farther away than the current point on the ray. However, when the depth in the buffer is closer you might be occluded or you might not, depending on a) the thickness of the occluder and b) if there are any other occluders behind the first one and their thickness. It seems most people assume a) is either infinite or a constant value and b) is ignored alltogether.

Since my new renderer is entirely based around screen space raymarching I wanted to improve on this to make it more accurate. This has been done before, but mostly in the context of order independent transparency (I think).

Let's look at a scene where the occluders are assumed to have infinite depth (I have tweaked the lighting for more distinct shadows to get a better look at raymarching artefacts, so the lighting does not exactly match the environment in these screenshot).


At a first …