Skip to main content

Low level audio

Audio has been a neglected area of our game engine for a long time. Except for a few tweaks, we have essentially used the exact same code for all our games. A pretty simple channel pool with pitch and volume per channel and an OpenAL back-end for iOS, Windows and OSX and an OpenSL back-end for Android.

OpenAL is quite okay for iOS and OSX, but requires a third-party installation on Windows. That's not a big issue since we're only using Windows for development, but still annoying. A bigger issue is the lack of render-to-file support in almost every Audio API I have seen, including OpenAL and OpenSL. That's a pretty given feature for a game developer, I can't believe there is so little support for it. We always render out gameplay sequences directly from the game engine to use in trailers, teasers, etc. Because we render in high resolution with motion blur there is no way to capture the audio in real-time along with the video, so audio has to be manually added later in a video editing software. Obviously a very time consuming procedure. A better way would be to render out the game audio frame by frame along with the video, but this requires software mixing and direct access to the final mix - something a hardware-centric audio API cannot offer. Does anyone know if OpenAL and OpenSL actually use hardware mixing these days or do they just mix in software anyway?

I decided to write my own software mixer with a very thin hardware abstraction layer per platform. Partly because I would get the render-to-file feature basically for free, but it also opens up the possibility to add in effects like echo and reverb.

I thought for sure that the hardest part of the mixer project would be to write the actual mixer. I couldn't be more wrong.. It turns out researching approrpriate low-level audio interfaces for each platform took way more time. There are at least four or five different APIs on Windows and just as many on OSX, yet it's suprisingly hard to just stream raw PCM data to the left and right speaker. Extremely frustrating, because this is exactly what the driver wants in the end anyway. Why not just offer a simple callback from a high-priority system thread - feedMoreDataPlease(void* buffer, int size)? I get that some people want 3D positioning, Internet streaming and what-not, but that's not a reason to hide low-level access for the rest of us that just want to submit a raw mix to the driver.

Cudos to Apple for exposing exactly this type of interface through the Audio Unit API (part of Core Audio). It can actually be configured to do what I'm asking for, and offers really good latency too - around 10 ms on both OSX and iOS.

Android doesn't seem to offer any viable options to OpenSL through the NDK. Fortunately OpenSL is also reasonably sane when it comes to streaming and offers latency in the 50 ms range (plus some hidden latency within OpenSL or the drivers, unclear exactly how much). It's not perfect, but acceptable. OpenSL is a beast to setup though. Why on earth would anyone design an API like that? And there is very little help available online except for the 600 page reference manual. I finally got it to work and I hope I never ever need to touch it again.

Windows is the only platform where I still haven't found a decent low-level audio API. There is Media Foundation and WASAPI which I haven't looked at because it's Windows 7 and Vista only. DirectSound is probably the most widely used, but it doesn't seem to offer a direct callback. Instead it relies on a user thread to feed new buffers periodically (making it virtually useless for low-latency stuff due to the horrible Windows scheduler). There is also the old waveOut interface in WinMM which at a first glance looks like a perfect match - it offers a callback when the driver needs more data, but here's the catch - you are not allowed to feed audio data from the callback itself because it may cause the process to deadlock! You are supposed to do this from a separate thread and can only commnuicate with certain "safe" system functions. I'm totally ignoring that for the time being, feeding audio data from the callback and it seems to work great in Windows 7 at least (I suppose this is because the waveOut interface is deprecated and wrapped in fifteen layers of user-mode code at this point...). The latency with the waveOut method ended up being in the 30 ms range.

It took some experimenting to get decent performance of the software mixer, but since it all happens on a separate thread I'm not too concerned these days when even mid-range mobile phones have multiple cores...


  1. Hey Dennis, did you look at XAudio2? It sounds like you want SubmitSourceBuffer.


  2. OpenAL implementation on Android does all mixing in C and then just submits raw PCM audio to Java API. Scrap OpenSL on Android it sucks and provides zero benefit over what OpenAL implementation does, in particular it does not provide lower latency, just check implementation and you'll that it just uses the same media framework as APIs which OpenAL implementation on Android uses.

  3. Hi Matt, yes that does indeed sound very good. I will take a look at that. Thanks!

    MrVol: I didn't know there was an official OpenAL implementation for Android. Which one are you using?

  4. I like that. I try to it, and it continueable.


Post a Comment

Popular posts from this blog

Bokeh depth of field in a single pass

When I implemented bokeh depth of field I stumbled upon a neat blending trick almost by accident. In my opinion, the quality of depth of field is more related to how objects of different depths blend together, rather than the blur itself. Sure, bokeh is nicer than gaussian, but if the blending is off the whole thing falls flat. There seems to be many different approaches to this out there, most of them requiring multiple passes and sometimes separation of what's behind and in front of the focal plane. I experimented a bit and stumbled upon a nice trick, almost by accident. I'm not going to get into technical details about lenses, circle of confusion, etc. It has been described very well many times before, so I'm just going to assume you know the basics. I can try to summarize what we want to do in one sentence – render each pixel as a discs where the radius is determined by how out of focus it is, also taking depth into consideration "somehow". Taking depth i

Screen Space Path Tracing – Diffuse

The last few posts has been about my new screen space renderer. Apart from a few details I haven't really described how it works, so here we go. I split up the entire pipeline into diffuse and specular light. This post will focusing on diffuse light, which is the hard part. My method is very similar to SSAO, but instead of doing a number of samples on the hemisphere at a fixed distance, I raymarch every sample against the depth buffer. Note that the depth buffer is not a regular, single value depth buffer, but each pixel contains front and back face depth for the first and second layer of geometry, as described in this post . The increment for each step is not view dependant, but fixed in world space, otherwise shadows would move with the camera. I start with a small step and then increase the step exponentially until I reach a maximum distance, at which the ray is considered a miss. Needless to say, raymarching multiple samples for every pixel is very costly, and this is with

Undo for lazy programmers

I often see people recommend the command pattern for implementing undo/redo in, say, a level editor. While it sure works, it's a lot of code and a lot of work. Some ten years ago I came across an idea that I have used ever since, that is super easy to implement and has worked like a charm for all my projects so far. Every level editor already has the functionality to serialize the level state (and save it to disk). It also has the ability to load a previously saved state, and the idea is to simply use those to implement undo/redo. I create a stack of memory buffers and serialize the entire level into that after each action is completed. Undo is implemented by walking one step up the stack and load that state. Redo is implemented in the same way by walking a step down the stack and load. This obviously doesn't work for something like photoshop unless you have terabytes of memory laying around, but in my experience the level information is usually relatively compact and se