When I implemented bokeh depth of field I stumbled upon a neat blending trick almost by accident. In my opinion, the quality of depth of field is more related to how objects of different depths blend together, rather than the blur itself. Sure, bokeh is nicer than gaussian, but if the blending is off the whole thing falls flat. There seems to be many different approaches to this out there, most of them requiring multiple passes and sometimes separation of what's behind and in front of the focal plane. I experimented a bit and stumbled upon a nice trick, almost by accident.

I'm not going to get into technical details about lenses, circle of confusion, etc. It has been described very well many times before, so I'm just going to assume you know the basics. I can try to summarize what we want to do in one sentence – render each pixel as a discs where the radius is determined by how out of focus it is, also taking depth into consideration "somehow".

Taking depth into consideration is the hard part. Before we even start thinking about how that can be done, let's remind ourselves that there is no correct solution to this problem. We simply do not have enough information. Real depth of field needs to know what's behind objects. In a post processing pass we don't know that, so whatever we come up with will be an approximation.

I use a technique sometimes referred to as

Now to the hard part. When gathering discs, each one of them have different sizes. That in itself it not very hard, just compare the distance to the disc radius. If it's smaller it contributes, otherwise not. The hard part is how it should contribute and when. Taking a stab at a "correct" solution, I compute how much a sample from each disc contributes. A larger disc means that the pixel intensity will get spread out over a larger area, so each individual sample gives a smaller contribution. Doing the math right, summing up everything in the end should yield the same overall intensity in the image. This is what it looks like:

It's not totally wrong, but it's definitely not right. Foreground and background objects do get blurred, but foreground objects get darkened around edges. You might also notice bright streaks across the focus areas. The streaks are artefacts from working with discrete samples in something that should be continuous. It's hard to do the correct thing when the disc size approaches zero, since you have to take at least one sample. What about the dark edges? If we did this the right way, we would actually blend in each disc, allowing some of the background to shine through. Since that information is occluded this is what we get.

Instead of trying to compute the correct intensity from each disc sample, let's just sum them up and remember the total, then divide with the total afterwards. This is pretty classic conditional blur and always gives the correct intensity:

for each sample

if sample contributes then

color += sample * contribution

total += contribution

end

end

color /= total

Definitely better. Both the streaks and the dark corners are gone, but foreground objects don't blur correctly over a sharp background. This is because the contribution from in-focus objects have a stronger weight than the large discs in the foreground. There are several ways to attack this issue. One that I found to look relatively good is to compute the contribution using the distance to the sample instead of the disc size. This is far from correct, but gives a nicer fall-off:

Now to the trick. Instead of computing a contribution based on disc size or distance, I use a fixed weight for all contributing samples no matter the distance or disc size, and blend in the current average if there is no contribution:

color = centerColor

total = 1.0

for each sample

if sample contributes then

color += sample

else

color += color/total

end

total += 1.0

next

color /= total

Huh? What this does is basically to give each contributing sample equal weight, but when there is no contribution, the current average is allowed to grow stronger and get more impact on the result. This is what it looks like:

Not bad. The blur is continuous and nice for both foreground and background objects. Note that this only works when sampling in a spiral, starting at the center pixel being shaded, because the first sample needs to be the pixel itself. I'm not going to try and explain details on why this algorithm works, simply because I'm not sure. I found it by accident, and I have spent days trying other methods but nothing works as well as this one.

There is one small thing to add before this is usable. The background shouldn't blur over foreground object that are in focus. Note though that if the foreground is out of focus, we want some of the background to blur in with it. What I do is to clamp the background circle of confusion (disc size) between zero and the circle of confusion of the center pixel. This means that the background can contribute only up to the amount of blurryness of the pixel being shaded. The scatter-as-gather way of thinking requires a lot of strong coffee.

if sample depth > center depth then

sample size = clamp(sample size, 0, center size)

end

Here is what the final image looks like:

A couple of notes on my implementation. After some experimentation I changed the clamping of sample size to clamp(sample size, 0, center size*2.0). For larger max radius I increase it even more. This controls how much of the background gets blended into a blurry foreground. It is totally unphysical and is only there to approximate the occluded information behind the foreground object.

The following code is written for clarity rather than speed. My actual implementation is in two passes, calculating the circle of confusion for each pixel and stores in the alpha component, at the same time downscaling the image to quarter resolution (half width, half height). When computing the depth of field I also track the average sample size and store in the alpha channel, then use this to determine wether the original or the the downscaled blurred version should be used when compositing. More on performance optimizations in a future post.

I'm not going to get into technical details about lenses, circle of confusion, etc. It has been described very well many times before, so I'm just going to assume you know the basics. I can try to summarize what we want to do in one sentence – render each pixel as a discs where the radius is determined by how out of focus it is, also taking depth into consideration "somehow".

Taking depth into consideration is the hard part. Before we even start thinking about how that can be done, let's remind ourselves that there is no correct solution to this problem. We simply do not have enough information. Real depth of field needs to know what's behind objects. In a post processing pass we don't know that, so whatever we come up with will be an approximation.

I use a technique sometimes referred to as

*scatter-as-gather*. I like that term, because it's very descriptive. GPU's are excellent at reading data from multiple inputs, but hopelessly miserable at writing to multiple outputs. So instead of writing out a disc for each pixel, which every sane programmer would do in a classic programming model, we have to do the whole operation backwards and compute the sum of all contributing discs surrounding a pixel instead. This requires us to look as far out as there can be any contributions and the safest such distance is the maximum disc radius. Hence, every pixel becomes the worst case. Sounds expensive? It is! There are ways to optimize it, but more on that later. Here is the test scene before and after blur with a fixed disc size for all pixels:Now to the hard part. When gathering discs, each one of them have different sizes. That in itself it not very hard, just compare the distance to the disc radius. If it's smaller it contributes, otherwise not. The hard part is how it should contribute and when. Taking a stab at a "correct" solution, I compute how much a sample from each disc contributes. A larger disc means that the pixel intensity will get spread out over a larger area, so each individual sample gives a smaller contribution. Doing the math right, summing up everything in the end should yield the same overall intensity in the image. This is what it looks like:

It's not totally wrong, but it's definitely not right. Foreground and background objects do get blurred, but foreground objects get darkened around edges. You might also notice bright streaks across the focus areas. The streaks are artefacts from working with discrete samples in something that should be continuous. It's hard to do the correct thing when the disc size approaches zero, since you have to take at least one sample. What about the dark edges? If we did this the right way, we would actually blend in each disc, allowing some of the background to shine through. Since that information is occluded this is what we get.

Instead of trying to compute the correct intensity from each disc sample, let's just sum them up and remember the total, then divide with the total afterwards. This is pretty classic conditional blur and always gives the correct intensity:

for each sample

if sample contributes then

color += sample * contribution

total += contribution

end

end

color /= total

Definitely better. Both the streaks and the dark corners are gone, but foreground objects don't blur correctly over a sharp background. This is because the contribution from in-focus objects have a stronger weight than the large discs in the foreground. There are several ways to attack this issue. One that I found to look relatively good is to compute the contribution using the distance to the sample instead of the disc size. This is far from correct, but gives a nicer fall-off:

Now to the trick. Instead of computing a contribution based on disc size or distance, I use a fixed weight for all contributing samples no matter the distance or disc size, and blend in the current average if there is no contribution:

color = centerColor

total = 1.0

for each sample

if sample contributes then

color += sample

else

color += color/total

end

total += 1.0

next

color /= total

Huh? What this does is basically to give each contributing sample equal weight, but when there is no contribution, the current average is allowed to grow stronger and get more impact on the result. This is what it looks like:

Not bad. The blur is continuous and nice for both foreground and background objects. Note that this only works when sampling in a spiral, starting at the center pixel being shaded, because the first sample needs to be the pixel itself. I'm not going to try and explain details on why this algorithm works, simply because I'm not sure. I found it by accident, and I have spent days trying other methods but nothing works as well as this one.

There is one small thing to add before this is usable. The background shouldn't blur over foreground object that are in focus. Note though that if the foreground is out of focus, we want some of the background to blur in with it. What I do is to clamp the background circle of confusion (disc size) between zero and the circle of confusion of the center pixel. This means that the background can contribute only up to the amount of blurryness of the pixel being shaded. The scatter-as-gather way of thinking requires a lot of strong coffee.

if sample depth > center depth then

sample size = clamp(sample size, 0, center size)

end

Here is what the final image looks like:

A couple of notes on my implementation. After some experimentation I changed the clamping of sample size to clamp(sample size, 0, center size*2.0). For larger max radius I increase it even more. This controls how much of the background gets blended into a blurry foreground. It is totally unphysical and is only there to approximate the occluded information behind the foreground object.

The following code is written for clarity rather than speed. My actual implementation is in two passes, calculating the circle of confusion for each pixel and stores in the alpha component, at the same time downscaling the image to quarter resolution (half width, half height). When computing the depth of field I also track the average sample size and store in the alpha channel, then use this to determine wether the original or the the downscaled blurred version should be used when compositing. More on performance optimizations in a future post.

Code layout is messed up, everything is in one line..

ReplyDeleteFixed! Thanks for letting me know.

DeleteSpeaking of layout, your blog would benefit from having a wider content section, I think.

DeleteLooks amazing! Well done. I had to many passes only to end up w a worse result. Note that the shader source code new lines don’t show up. (I used safari)

ReplyDeleteThank you, I have fixed code layout. The webs..

DeleteThanks for the nice article.

ReplyDeleteIf anyone wants a bonus challenge, consider the total distance of reflected objects when applying focal blur.

Good point... I haven't really thought of that. For fully reflective materials that sounds like a good idea, otherwise I doubt one would notice much difference. Do you know of any (post effect) implementations that accounts for that?

Deletefloat coc = clamp((1.0 / focusPoint - 1.0 / depth)*focusScale, -1.0, 1.0);

ReplyDeleteSo `focusPoint` represents the focal distance in camera space and `depth` represents the (linear) distance in camera space? How does this statement relate to the calculation of the diameter of the circle-of-confusion: focusScale*abs(depth - focusPoint) / depth = focusScale*abs(1 - focusPoint/depth) (Cfr. http://www.reedbeta.com/blog/circle-of-confusion-from-the-depth-buffer/, https://en.wikipedia.org/wiki/Circle_of_confusion)? Furthermore, clamping and taking the absolute value of this diameter of the CoC seems like some normalization which is followed by a multiplication by MAX_BLUR_SIZE. But why couldn't the actual (originally-obtained) diameter be used instead?

Thanks in advance.

You are absolutely correct. I first clamp to -1, 1 and then multiply with MAX_BLUR_SIZE, but another way would be to use the coc value directly and instead clamp between -MAX_BLUR_SIZE and MAX_BLUR_SIZE (Negative values are needed for the depth compare to work). The focusScale parameter is just a made up parameter that depends on the properties of the lens.

DeleteBut if this ad-hoc focusScale is replaced by the more touchable aperture diameter, you have completely determined the CoC of that sample without the need of the MAX_BLUR_SIZE (which you still need as fixed threshold for efficiently restricting the search space while gathering). If you map that CoC (radius) from camera to screen space, you know the region of influence of each pixel. (I am just thinking out loud; not sure if it makes sense :) )

ReplyDeleteI think in that case you'd need the focal plane distance into the equation as well, so there would be two parameters instead of one(?). It's probably doable, but I'm not sure I see the point as this whole thing is just a hack anyway :)

DeleteThat said, I'll probably clamp to -MAX_BLUR_SIZE, MAX_BLUR_SIZE instead of -1,1 as you suggested. That way the characteristics of the blur does not depend on the performance setting, and it saves one instruction in the inner loop (not that this is ALU bound anyway, but still).

DeleteIsn't "focusPoint" the focal distance? E.g. CoC_diameter = aperture_diameter * abs(1 - z_focus / z_camera) (everything expressed in camera space distance). This CoC_diameter can be transformed to a pixel_diameter and if you have no contribution you can still use your hack and give more weight to the current average?

DeleteFocus point in my code is the distance to an object in perfect focus. In order to compute the CoC diameter I think you need the aperture size and the distance from the lens to the "film" (focal plane). I'm really no expert in optics, so I might be completely wrong here, but I don't see how just the aperture could determine CoC size in a real world setup.

DeleteApparently, the "lens focal length" isn't needed in camera space, but the transformation uses a magnification multiplier (m = focal_length / z_focus) which depends on it, to transform it to a pixel diameter (or image film diameter). See "Determining a circle of confusion diameter from the object field" at https://en.wikipedia.org/wiki/Circle_of_confusion.

ReplyDeleteNot sure what you are suggesting, but I think it is pretty clear from the page you linked that aperture and focal length are both needed in order to determine CoC size. What I call focus point is S1 in the wikipedia image. The linear depth is S2, and what I call focusScale is dependent on A and f.

DeleteOk, nvm in that case. Thought your "focusScale" is "A".

Delete