Skip to main content

Hail to the hall - Environmental Acoustics

One of our early goals with Smash Hit was to combine audiovisual realism with highly abstract landscapes and environments. A lot of effort was put into making realistic shadows and visuals, and our sound designer spent long hours finding the perfect glass breaking sound. However, without proper acoustics to back up the different environments, the sense of presence simply would be there.

To achieve full control over the audio processing and add environmental effects I needed to do all the mixing myself. Platform dependent solutions like OpenAL and OpenSL cannot be trusted here, because support for environmental effects is device/firmware specific and missing in most mobile implementations. Even it was available it would be virtually impossible to reliably map parameters between OpenSL and OpenAL. As in most cases with multi-platform game development, DIY is the way to go.

Showcasing a few different acoustic environments

Software mixer

Writing a software mixer is quite rewarding – a small, well defined task with a handful operations performed on a large chunk of data, thus very suitable for SIMD optimization. A software mixer is also one of those few subsystems that has, or can have, a real work analogue - the physical audio mixer. I chose a conventional, physical abstraction, so my interface classes are named Mixer, Channel, Effect, etc, but there might better ways to structure it.

The biggest hurdle when writing a software mixer turned out to be the actual mixing. Two samples playing at the same time are added together, but what happens if they both play at maximum volume? The intuitive implementation, and what also happens in the real world is clipping. This is what most real audio programs do. Clipping is a form of distortion, where minimum and maximum audio levels are simply clamped above or below the physical threshold, effectively destroying or reshaping the waveform. In an audio software, you would typically adjust the levels manually in order to avoid clipping, but in games, where audio is interactive this can be really tricky. Say for instance you have a click sound for buttons. If there are no other sounds playing you want the click played back at maximum volume, but if there is music in the background the volume level needs to be lowered. If there is an explosion nearby it needs to be adjusted even further to avoid clipping.

One way to reduce clipping is to transform the output signal in a non-linear fashion, so that it never really reaches the maximum level. This still has the problem that it will affect the result when there is only one sample playing. Hence, when the click is played back in isolation it won't be at maximum volume.

Some people suggest that the output should be averaged in the case of multiple channels. So if there are three sounds, A B C playing at once. You mix them as (A+B+C)/3. This is not a good way to do it, because the formula doesn't know anything about the content of each channel (B and C can for instance be silent, still resulting in A played back at a third of the volume).

What we need is some form of audio compression - an algorithm that compress audio dynamically, based on the current levels. Real audio compressors are pretty advanced, with a sliding window to analyze the current audio content and adjust the levels accordingly. Fortunately there is a "magic formula" that sounds good enough in most cases. I found this solution by Viktor T. Toth: mix=A+B-A*B, but when adapting it to floating point math I realized that a slight modification into: mix=A+B-abs(A)*B is more suitable to better deal with negative numbers. Each channel is added to the mix separately, one at a time, using the following pseudo code:

mix = 0
for each channel C
  mix = mix+C - abs(mix)*C

This means that if there is only one channel playing, it will pass unmodified through the mixer. The same applies if there are two channels but one is completely silent. If both channels have the maximum value (1.0), the result will be 1.0, and anything in between will be compressed dynamically. It is definitely not the best or most accurate way to do it, but considering how cheap it is, it sounds amazingly good. I use this for all mixing in Smash Hit, and there are typically 10-20 channels playing simultaneously, so it does handle complex scenarios quite well.

I use three separate mixers in Smash Hit – the HUD mixer, which is used for all button clicks and menu sounds, the gameplay mixer, which represent all 3D sounds, and the music mixer which is used for streaming music. The gameplay mixer has a series of audio effects attached to it to emulate the acoustics of different room types.


Given how useful a reverb effect is in game development, it's quite surprising to me how difficult it was to find any implementations or even an explanation online. At a first glance, the reverb effect seems much like a long series of small echoes, but when trying it, the result sounds exactly like that – a long series of small echoes, not the warm, rich acoustics of a big church. If one tries to make the echoes shorter, it turns more and more metallic, like being inside a sewage pipe.

There is a great series of blog posts about digital reverberation by Christian Floisand that contains a lot of the theory and also a practical implementation: Digital reverberation and Algorithmic Reverbs: The Moorer Design.

It uses a series of parallel comb filters passed through all-pass filters in series. The comb filters is basically a short delay line with feedback, representing reflected sounds, while the all-pass filter are used to thicken and diffuse the reflected sound by altering the phase. I don't know enough signal theory to fully understand the all-pass filter, but it works great and implementation is fairly easy.

In addition to the comb filters and all-pass filters I also added a couple of tap-delays (delay line without feedback), representing early reflections on hard surfaces, as well as low-pass filters in each comb filter allowing a great way to control the room characteristics. Christian's article suggest the use of six comb filters, but for performance reasons I cut it down to four. I'm using four tap-delays and two all-pass filter, plus a pre-delay on the entire late reflection network.

All audio is processed in stereo in Smash Hit, so the reverb needs to be processed separately on the left and right channel. I slightly randomize the loop time in the comb filters differently for the left and right channel, which gives the final mix a very nice stereo spread and a much better sense of presence.

In addition to reverb I also implemented a regular echo as well as a low-pass filter. The parameters of these three filters are used to give each room its unique acoustics.


  1. Thanks again for this - interesting (as always). I'll certainly be trying the mix=A+B-abs(A)*B trick!

  2. An unrelated thing I'd be interested to hear about is how do you feel about this monetization scheme you employed, where the entire game is available for free and the main point of the upgrade is saving progress. I suppose it gave your game a popularity boost, due to huge number of free users. But was it a good decision in terms of revenue? (not the most important thing, I know, and with your success you probably don't need to worry about running out of money). Asking for a conversion rate of Smash Hit would be too much I guess, but... never hurts to ask. :)

    1. To be honest, it was a bit of a wildcard, but it turned out well. I don't think it's completely fair to compare directly with our previous premium titles, but after two months, Smash Hit was already our most profitable game so far.

  3. it's the price of a bland coffee at the local cafe and already i am enjoying it much more. Or should i say still.

    no access to fpu acceleration for more accurate/dynamic sound? I mean sound with more dynamics. Perhaps a 'headphone' mode where you don't do any compression but allow sounds from very quiet to very loud? it would be amusingly visceral to hear massive slabs of glass shatter right next to your ears

  4. Hi dennis,

    I am Sameer an indie developer.
    I am implementing a 2D water simulation for a game.
    I need your little help with the stretching,aligning of the particles
    which you described in your blog i.e. tuxedoabs.
    As when my water particles separated from the main flow it looks like
    circles (as box2d circles).
    I tried implementing 2 FBOs and decreasing the opacity of the one fbo.
    but in Adreno GPUs, framebuffer switching is very expensive, I cant use this.
    Will you please help me implementing this?
    I just need little help from you.

    skypeid: sameertripathy2

  5. What you do use to compile de game to iOS? Do you do everything in C++ than load in Xcode project?

    1. Exactly, I have an XCode project with a small Objective-C wrapper that basically just calls my C++ program and handles I/O.


Post a Comment

Popular posts from this blog

Bokeh depth of field in a single pass

When I implemented bokeh depth of field I stumbled upon a neat blending trick almost by accident. In my opinion, the quality of depth of field is more related to how objects of different depths blend together, rather than the blur itself. Sure, bokeh is nicer than gaussian, but if the blending is off the whole thing falls flat. There seems to be many different approaches to this out there, most of them requiring multiple passes and sometimes separation of what's behind and in front of the focal plane. I experimented a bit and stumbled upon a nice trick, almost by accident.

I'm not going to get into technical details about lenses, circle of confusion, etc. It has been described very well many times before, so I'm just going to assume you know the basics. I can try to summarize what we want to do in one sentence – render each pixel as a discs where the radius is determined by how out of focus it is, also taking depth into consideration "somehow".

Taking depth into…

Screen Space Path Tracing – Diffuse

The last few posts has been about my new screen space renderer. Apart from a few details I haven't really described how it works, so here we go. I split up the entire pipeline into diffuse and specular light. This post will focusing on diffuse light, which is the hard part.

My method is very similar to SSAO, but instead of doing a number of samples on the hemisphere at a fixed distance, I raymarch every sample against the depth buffer. Note that the depth buffer is not a regular, single value depth buffer, but each pixel contains front and back face depth for the first and second layer of geometry, as described in this post.

The increment for each step is not view dependant, but fixed in world space, otherwise shadows would move with the camera. I start with a small step and then increase the step exponentially until I reach a maximum distance, at which the ray is considered a miss. Needless to say, raymarching multiple samples for every pixel is very costly, and this is without …

A better depth buffer for raymarching

When doing any type of raymarching over a depth buffer, it is very easy to determine if there is no occluder – the depth in the buffer is farther away than the current point on the ray. However, when the depth in the buffer is closer you might be occluded or you might not, depending on a) the thickness of the occluder and b) if there are any other occluders behind the first one and their thickness. It seems most people assume a) is either infinite or a constant value and b) is ignored alltogether.

Since my new renderer is entirely based around screen space raymarching I wanted to improve on this to make it more accurate. This has been done before, but mostly in the context of order independent transparency (I think).

Let's look at a scene where the occluders are assumed to have infinite depth (I have tweaked the lighting for more distinct shadows to get a better look at raymarching artefacts, so the lighting does not exactly match the environment in these screenshot).

At a first …