Tuesday, March 13, 2018

Hot reloading hardcoded parameters

Here is a trick that I cannot take any credit for, but that I finally implemented. I remember reading about it online several years ago, but I cannot find the reference again (it might have been on mollyrocket), so I'll write up the idea:

Everyone uses hardcoded parameters sometimes because it's fast and easy:

float someValue = 5.0f;

Once you have a parameter in the code, it's likely that you sooner or later want to tune that into some kind of sweet spot. With a hardcoded parameter the process often involves recompiling and restarting (unless you implemented code hot reloading, in which case it still involves recompiling) many times to try out different values. A popular approach is to add some form of config file to get rid of the recompile step. Config files can be hot reloaded to also get rid of the restart step, but config files require some extra work for each parameter. You need to name the parameter, and you need to add it to the config file.

The idea of parameter hot reloading is to use the code itself as config file. The file is already there, the context and naming is already there, the initial value is already there, and once you're done tweaking, the result is already in your code!

This can be done by wrapping the tweakable, hardcoded parameter in a macro:

float someValue = TWEAK(5.0f);

The macro expands to a function call that looks something like this, using the __FILE__ and __LINE__ built-ins:

float TweakValue(float v, const char* fileName, int lineNumber);

This function stores each tweakable value in a hash table using the file and line properties, so that it can be accessed quickly. The really sweet part is that since we know the file and line number we can periodically (say once a frame, or using some file modification event) check if the source file has changed, and when it changes, just parse out the new value. Note that this is rather trivial, since at this point we know exactly on what line to look for it and how to parse it, because it will be wrapped by a parenthesis right after the word TWEAK.

One limitation is that it only works for one tweakable parameter per line. It's probably possible to make it work for more than one, but that requires a lot more parsing. Note that this can be done for more than just floats. I've also added support for booleans, vectors and colors. The boolean especially can be useful to toggle between two implementations at run time:

if (TWEAK(true))

Needless to say, in production builds, the TWEAK macro is just bypassed, adding zero overhead. Pretty neat isn't it?

Friday, February 9, 2018

Header file dependencies

Ten years ago, I wrote my own C++ software framework and it was probably one of the best moves in my career as a software developer. It has been immensely useful for every little project I have done ever since, but adding bits and pieces and modifying it down the road has made the software quality slowly degrade. I'm half-way through a rewrite, not from scratch but a pretty serious overhaul. One thing I've spent a lot of time on is reducing header file dependencies to improve compile times. It is one of those strangely satisfying things that you can never really motivate to spend time on while in production. So far I've managed to cut the compile time in half (from 17 seconds for a full rebuild down to 9, so it really wasn't that bad before either), mostly by eliminating system headers.

A very accessible tool for this is actually GCC. Just add the -H flag and it will print out a hierarchical header dependency graph, including system headers. Using this I found out that the system header, required for standard placement new operator included over 50 system headers. I could get rid of it thanks to a tip I got on twitter, by just declaring my own placement new operator like this:

enum TMemType { MEMTYPE_DUMMY };
inline void* operator new(size_t, TMemType, void* ptr) { return ptr; }

Then I just use T_PNEW instead of new in all of my templated containers.

I've also moved over to private implementation as a standard practice. It's a bit clunky and requires an extra pointer dereference for every call, but it reduces dependencies a lot. Most of my headers now only include the "tconfig.h" with standard types, and sometimes containers. The only place I found this extra dereference to be unacceptable was the mutex, but I also didn't want to include system headers in my own headers, so the ugly but functional compromise I settled on was to reserve memory using a char mMutexStorage[T_MUTEX_SIZE] in the header and then do a placement new in the implementation file. The value of T_MUTEX_SIZE will be platform dependent, but can easily be compared to the actual mutex size at runtime to avoid memory corruption.

Finally, I must mention something that took me a long time as a developer to realize – return types can be forward declared even when returned by value. It makes sense if you know how they are passed on the stack, but I somehow always just assumed the compiler needed to know the size for it unless it was pointer or reference. It was maybe five years ago I learned this and all compilers I came across since then accept forward declaration for return types. That makes a huge difference, because you can do this:

class TString getSomeInformation();


class TVec3 getWorldPosition();

Or any type you like, without including the actual header for it. This means you can get away with pretty much zero includes as long as the implementation is private without compromising your API. I'm pretty sure this works for parameters as well, but since they are usually passed as const references anyway (mine are at least) it's not such a big deal.

The only exception I found was MSVC, that does not allow this for cast operators, but luckily it can be bypassed by forward declaring outside the class instead of inlining it in the method declaration.

Speaking of forward declarations, this is also why I don't use namespaces. I always make inline forward declarations like the ones above, never at the top. It would have been so convenient to make inline forward declarations of namespace members, like this:

class mylib::math::Vec3 getWorldPosition();

But there is no such thing and I hate those clunky forward namespace declarations at the top of the header, so I'll stick with prefixes for now.

Tuesday, January 2, 2018

Screen Space Path Tracing – Diffuse

The last few posts has been about my new screen space renderer. Apart from a few details I haven't really described how it works, so here we go. I split up the entire pipeline into diffuse and specular light. This post will focusing on diffuse light, which is the hard part.

My method is very similar to SSAO, but instead of doing a number of samples on the hemisphere at a fixed distance, I raymarch every sample against the depth buffer. Note that the depth buffer is not a regular, single value depth buffer, but each pixel contains front and back face depth for the first and second layer of geometry, as described in this post.

The increment for each step is not view dependant, but fixed in world space, otherwise shadows would move with the camera. I start with a small step and then increase the step exponentially until I reach a maximum distance, at which the ray is considered a miss. Needless to say, raymarching multiple samples for every pixel is very costly, and this is without a doubt the most time consuming part of the renderer. However, since light is usually fairly low frequency information, it can be done at lower resolution and upscaled using this technique. I was also surprised that the number of steps needed for each ray is also quite low, as long as the step size is exponential, starting with a small step and then increase the step size gradually. This will capture fine detail near creases while also preserving occlusion from bigger obstacles nearby.

In the case of a ray miss, I fetch light from a low resolution environment map and in case of a hit, I fetch light from the hit pixel reprojected from the previous frame. This creates a very crude approximation of global illumination since light is able to bounce between surfaces across multiple frames. This is also what enables me to light objects using emissive materials as shown in this video. Here is pseudo code for the light sampler:

function sampleLight(pixel, dir)
  stepSize = smallDistance
  pos = worldPos(pixel)
  for each step s
    pos += dir * stepSize
    if pos depth is occluded by first or second layer of pixel(pos) then
      return fraction of light from pixel(pos) in previous frame
    stepSize *= gamma (greater than one to increase step size)
  return fraction of light transfer from cubeMap[dir]

Like any path tracing, what comes out will contain a certain amount of noise. Even on a powerful machine and in half resolution, there isn't time for more than a handful samples per pixel, so noise reduction is a very important step. This is what it looks like without any noise reduction at sixteen samples per pixel, and each sample is marched in twelve steps. Click image to view high resolution.

After applying a temporal reprojection filter, I get rid of a lot of the noise. Note that the filter runs on the light buffer before it gets applied to the scene using object color, etc. The problem with temporal filters is of course that areas where information is missing will be very noticeable during motion. Therefore, this filter cannot be too aggressive. I keep it around 70% and I also compare depth values in order to avoid ghosting.

After the temporal filter, I also apply a spatial filter that uses object ID, or more specifically smoothing group ID, to not blur across different surfaces. Again, this filter cannot be too aggressive, or fine detail will get lost. I use a 7x7 pixel kernel in two passes.

There is still some low frequency noise present, but when it will get eaten by temporal anti-aliasing and the final image looks like this.

Tuesday, November 21, 2017

A better depth buffer for raymarching

When doing any type of raymarching over a depth buffer, it is very easy to determine if there is no occluder – the depth in the buffer is farther away than the current point on the ray. However, when the depth in the buffer is closer you might be occluded or you might not, depending on a) the thickness of the occluder and b) if there are any other occluders behind the first one and their thickness. It seems most people assume a) is either infinite or a constant value and b) is ignored alltogether.

Since my new renderer is entirely based around screen space raymarching I wanted to improve on this to make it more accurate. This has been done before, but mostly in the context of order independent transparency (I think).

Let's look at a scene where the occluders are assumed to have infinite depth (I have tweaked the lighting for more distinct shadows to get a better look at raymarching artefacts, so the lighting does not exactly match the environment in these screenshot).

At a first glance it may look okay, but at certain angles, it is very evident that something is off:

Even an object that is visibly thin will receive a shadow as if infinitely thick. The go-to trick in this situation is to hardcode a thickness and tweak until it looks acceptable:

Still artefacts, but much better. However, for most scenes it's just not possible to find one single thickness that works for everything. What we ideally want is the actual object thickness per pixel. One relatively cheap way of approximating depth is to render a depth buffer for back faces. As long as objects don't overlap, are closed and reasonably convex, the difference between front face depth and back face depth is actually a pretty accurate representation of the object thickness.

I store front face and back face depth in different channels of the same texture, so I just retrieve RG instead of R for each pixel and compare the depth to both values when raymarching, making it really cheap. This removes a lot of artefacts, but there is still room for improvement.

It is hard to visualize in a still image, but with a moving camera it becomes very clear that shadows are only visible for the first layer of objects. As soon as an object disappears behind something, its shadow is also gone. This is of course particularly evident with long shadows from, say, a sunset.

Creating another layer of depth information is called depth peeling and there are several ways to do it. I use the stencil buffer, but it can also be done by discarding fragments in a shader. I already mentioned that I store front and back face depth in two different channels of the same texture, so why not add another layer of front and back face depth and make it a full, four channel texture? All four depth values (first front, first back, second front, second back) can still be fetched as a single texture read, making it really fast.

One could imagine doing even more depth layers, but the visual improvement would be hard to notice.

Thursday, November 9, 2017

Upscaling half resolution screen space effects

When working with diffuse lighting and ambient occlusion in screen space it is often very tempting to do computations in lower resolution. Most of it is blurry anyway, and for any kind of GI/path tracing, diffuse lighting is undoubtedly the bottleneck. Here is a test scene with all colours set to white and no textures.

Enabling only the diffuse lighting, the image looks strangely familiar.

You quickly realise that diffuse lighting is the lion's share of the entire image. Since everything is the same colour, two overlapping objects can be told apart only because they differ in diffuse lighting. Therefore, lowering the resolution of diffuse lighting also means that a lot of edges will be half resolution and the same diffuse lighting suddenly looks like this.

Not acceptable (click on image to view full resolution), but note that the image looks perfectly fine over larger areas where there are no edges, and also at the contours towards the skybox. I've come to think of two solutions to this problem:

1) Render at half resolution. Detect edges and re-render pixels near edges during upsampling. This would probably work very well, but I didn't try it yet.

2) A cheaper solution would be to cover up faulty pixels on the edges using neighbouring pixels from the same surface (it's all blurry, remember?), practically retouching the edges much the same way you retouch images in photoshop.

I decided to try the latter and got some interesting results. First I create a 2D "retouching" vector field. It is basically just a distance offset, telling each pixel where to fetch it's samples. In the middle of a surface this will be (0,0) and near an edge it will point away from the edge. If you have any way of classifying surfaces in a shader this is actually really cheap to do. I just use a unique number for each smoothing group to identify smooth surfaces and for each pixel, I check the eight neighboring pixels, average the offset of the ones that are in the same smoothing group. Ta-da, the average offset will now point in a direction away from each edge, and the retouch vector field looks something like this (here visualized upscaled and with absolute values):

Now if you process the downscaled, half resolution, diffuse lighting through this retouch field during upscaling, the resulting image will magically look like this:

Congratulations, you just saved ~75% processing time for your diffuse lighting. However, there are artifacts, as always. But I found the results to be acceptable in most situations. Computing diffuse lighting in half resolution (quarter pixel count) allowed me to do eight samples per pixel instead of two, resulting in more accurate lighting and less noise.

Another really nice property of the retouch vector field is that once you've created it, you can reuse the same field for any screen space upscaling you might do. I for instance reuse the same field when upscaling screen space reflections, and I'm hoping to use it also for smoke particles once I get there.

Sunday, October 15, 2017

Depth of field in VR

I have always been very fascinated by depth of field in computed graphics. For me, it often defines photo realism, mimicking the shortcomings of a real camera. Naturally it is a poor match for interactive applications because the computer doesn't know what the user is looking at, but I've tried to squeeze in depth of field in as many of the Mediocre games as I could get away with. In Smash Hit I wanted to use it for everything, but Henrik though it looked too weird and made it harder to aim (he was probably right), so we ended up only enabling only it in the near field. In Does not Commute and PinOut, which both have fixed camera angles I'm doing a wonderful trick, enabling a slight depth of field by seamlessly blurring the upper and lower parts of the screen. This is super cheap and takes away depth artefacts completely.

Anyway, since I'm so obsessed with depth of field I've started experimenting with it in VR. This has opened up a whole can of new problems and frustrations and I'd like to share some of my findings so far.

When I first tried it out with a fixed focal length it just looked weird and I couldn't really put my finger on why it was so different from a flat screen. Being able to focus on the blur itself gives an extremely artificial look. Some people say your eyes can't focus on different things in VR because the screen is always at the same distance from your eyes. This is only partly correct. You can still turn your eyes independently in VR. A lot of what is focus in real life is not related to the lens in the eye, but the angle of your eyes (vergence). This is what causes double vision behind or in front of what you're focusing on and is probably a far more important depth cue than the blur itself. This is already in VR "for free", it wasn't until I tried depth of field in VR that I understood why all VR experiences I've tried has been a bit "messy". Like a three dimensional clutter of too much information. When the depth of field is there and coincides with the double vision that is already there it gives a certain calmness to the image that is very pleasant.

Using depth of field as tool for directing the viewers attention towards a specific area is probably never going to work in VR. I'm still not sure why it works so extremely well in 2D but fails so miserably in VR, but it's probably because the viewer in VR can in fact "focus" on the out of focus areas by adjusting the angle of the eyes. However, depth of field could still add a lot to VR by removing the visual clutter, not really adding anything new but making the experience less painful.

In order to make an adaptive depth of field, one that adjusts the focal length dynamically, I've implemented a system that is pretty similar to what has been used in (D)SLR cameras for decades – a set of focus points that are all weighed together, making the central points more dominant. For each focus point I shape cast a small sphere from the camera and record the hit distance. The reason I'm using sphere shape casting instead of raycasting is that I want focus points to ignore minor gaps between geometry. It also gives smoother focus transitions when moving the camera.

The focus point system works relatively well, but it tends to ignore small objects in the foreground, not because the shape cast will miss them (it does not) but because very few of them hit, only making a minor contribution to the final focal length. To overcome this I introduced a minimum and maximum forced focus range, in which everything is in focus. We are now leaving physical territory, because with a real lens there is only a singular depth where objects are in focus. However, a forced focus range is definitely not any less accurate then having focus everywhere, so who cares. The forced focus range starts with the focus point-computed focal length. For any focus point closer than that I simply adjust the range to include that distance. This modification turned out pretty good. It tends to keep most objects in the center area of the screen in focus all the time, so to some extent it cancels out the whole point of adding depth of field, but having out of focus objects in the peripheral vision and behind the main objects in the center gives a much more pleasant looking image.

It does have some drawbacks of course. You can still focus on the blur, but it really only looks weird when it happens in the near field, which with the focus range modification can only happen if you look away from the center (and honestly, the lenses in todays HMDs are so bad that everything is blurry there anyway).

So is it worth the hassle? I'm not sure, but I think so. When it works it looks truly awesome, and when it fails it's definitely annoying. I'll keep experimenting with this one.

Finally, here is how I do the actual depth of field. Probably nothing new in there, but everyone does it a little differently. Here is my version:

1) Fill up the DOF (depth of field) buffer using the final composited image before bloom and tone mapping. Store in in an RGBA texture, where the alpha channel represents the amount of blur. I use a half size texture for this. To compute the blur amount, I'm using the formula k*(1/focalPoint-1/distance).

2) Blur the alpha channel of the DOF buffer horizontally and vertically. The size of the blur is based on alpha value. The blur must be depth aware, so that: A) Fragments further away than the current fragment do not contribute to the blur. This prevents objects behind something to blur what is in front, and B) Fragments that are closer do contribute, but only scaled by their alpha value. This will make blurry objects in front of something sharp bleed out over their physical extent.

3) Blur the RGB values of the DOF buffer horizontally and vertically based on the blurred alpha. This pass must also be depth aware in exactly the same way as B) above.

4) Mix the RGB values of the DOF buffer into the final image using the blurred alpha channel.

Wednesday, September 13, 2017

Adventures in Screen Space

Eight years ago, just when I first started writing this blog my second post was about screen space ambient occlusion. I used that renderer for all my physics experiments, leading up to the fluid simulation that became Sprinkle. At that point I left desktop computing in favor of mobile devices. Ten games later I'm now back to desktop machines and I'm completely blown away by all the computing power.

For the first Sprinkle game I had to make dedicated geometry with holes in it when drawing large alpha blended overlays because the fill rate was so terrible. Now I'm running hundreds of lines of code really doing complex computations per pixel. Sorry, you have probably already adjusted but this will take me a while. 

So what would be more fitting than to freshen up that old physics renderer (well more like starting from scratch, but still). I have been wanting to experiment with physics in VR for a while and now is the time. For this I need a renderer that can handle a truly dynamic world with no precomputed lighting.

I have implemented screen space ambient occlusion with temporal reprojection filtering that takes a lot of the noise away without smearing out the result. I've always hated shadow maps. They are hard to implement and the result is usually disappointing, so for this renderer I tried doing shadows entirely in screen space using ray marching towards the light source. It's a bit of an experiment, but I find the results really interesting. The characteristics are very different from regular shadow maps – instead of getting precise but jagged shadows this one gives imprecise and smooth, blurry shadows. I can't really decide if I like it or not. For a sunny outdoor setting, regular shadow maps are probably better, but for the more diffuse, indoor lighting this is quite promising.

There is also depth of field close to the camera done in four passes on half resolution and motion blur on everything. I'm going for a old, analogue look on the final result, so any imperfections that can tone down the artificial computer graphics characteristics is a good thing. The ambient occlusion and screen space shadows do add a little bit of noise, but there is one cheap and paradoxically efficient way of hiding unwanted noise: add more noise. So at the final stages of the pipeline I add 5-7% of greyscale noise which hides some of the noise in occluded areas and adds to the analogue look.

I have a bloom pass as well and I just started playing with tone mapping. I'm not sure I'm really getting it, but I'll keep experimenting. For anti-aliasing my friend Ludde Andersson over at Scaupa pointed me to a temporal reprojection method that I found very interesting. Since I'm already doing temporal repojection for the occlusion and shadows it was quite easy to do the same for anti-aliasing. The idea is to move the viewport at sub-pixel resolution every frame and smooth out the result with an accumulation buffer. It also turned out that one of me absolute favourite games Inside has a great presentation on the topic from last years GDC. The results are absolutely stunning. I'm not sure I have ever come a cross a new rendering technique that is so clever and simple yet produces so fantastic results with almost no computational overhead. Am I missing something or why aren't everybody using this?