Wednesday, April 2, 2014

Smashing tech

Our latest game Smash Hit has gone far beyond all expectations, being the #1 free game in over 100 countries during launch week and approaching 35 million downloads! I will write several blog posts about the technology here, starting with a tech summary of what is being used and then go deeper into each subject in future posts.



Physics
This is by far our most physics intense game to date. It's almost like a physics playground with a game glued on top. The physics engine is tailor made for this game specifically, but builds on top of the low level physics library I was working on a few years ago. The game is actually a great show case for the low level physics library since it is very non-generic. It's a streaming, highly dynamic world where more or less everything is moving all the time and objects get inserted and removed constantly. Therefore there is no deactivation (sleeping) or islands generation. There are also two types of objects simulated differently. The full rigid body, plus debris which are more light weight.

Destruction
Destruction is the core game mechanic and had to be fully procedural, very robust and with predictable performance. The engine supports compounds of convex shapes, like most physics engines. These shapes are then split with planes and glued back together when shattered. Though most objects in the game are flat, the breakage actually support full 3D objects with no limitations. The breaking mechanic is built into the core solver, so that objects can break in multiple steps during the same iteration. This is essential for good breakage of this magnitude.

Graphics
Due to the highly dynamic environment where there can be hundreds of moving objects at the same time, one draw call per object was not an option. Instead, all objects are gathered into dynamic vertex buffers. So there are basically only one draw call per material. Vertex transformation is done on the CPU to offload the GPU and allow culling before vertices and triangles are even sent to the GPU. CPU transformation also opens up for a few other tricks not available with conventional rendering. The camera is facing the same direction all the time, which allows the use of billboards to approximate geometry. You can see this in a few instances for round shapes in particular throughout the game.



Shadows
The static soft shadows are precomputed vertex lighting based on ambient occlusion. Lighting is quadratically interpolated in the fragment shader for a natural falloff. The dynamic soft-shadows are gaussian blobs rendered with one quad per rigid body. The size and orientation of the shadow need to be determined in run-time since an object can break arbitrarily. I'm using the inertia tensor of the rigid body to figure this out, and the shadow is then projected down on a plane using a downward raycast. This is of course an enormous simplification, but it looks great in 99% of all cases!

Music and sound
I wrote my own software mixing layer for this game, which enables custom sound effects processing for environmental acoustics. I use a reverb, echoes and low-pass filters with different settings for each environment in the game. The music is made out of about 30 different patterns, each with an intro and an outro, which are sample-correct mixed together during the transitions. The camera motion is synchronized to the music progression, so the music always switches to the next pattern exactly when entering a new room. This was pretty hard to get right, since this had to be done independent of the main time stepping in order to support slower devices. Hence, camera motion and physics simulation had to be completely decoupled in order to have both predictable simulation and music synchronization on all devices.



Scripting
Scripting has been a very useful tool during the development of this game. Each obstacle in the game is built and animated using a separate lua script. Since each obstacle is procedurally generated, it allows us to make many different varations of the same obstacle. For instance configuring width, height and color, or number blades in a fan, etc. Each obstacle runs within its very own lua context, so it is a completely safe sandbox environment. I've configured lua to minimize memory consumption, and implemented an efficient custom memory allocator, so each context only requires a single memory block of about 40 kb, and there are a few dozed of them active at the same time at most. Garbage collection is amortized to only run for one context each frame, so performance impact is minimal.

Performance
The game is designed for multicore devices and uses a fork-and-merge approach for both physics and graphics. I was considering putting the rendering on a separate background thread, but this would incur an extra frame of latency, which is really bad for an action game. The audio mixing and sounds decoding is done on separate threads.

If there is any area you find particularly interesting, let me know!

Tuesday, January 7, 2014

GDC Physics Tutorial

I will give a talk on fluid simulation on GDC this year! Make sure to attend the physics tutorial.

The session will focus on the formulation of a fluid constraint. In contrast to most other particle-based fluid simulatiors, mine uses a sequential impulse solver, normally found in rigid body engines. This improves incompressibility and makes interaction with rigid bodies very stable. This is the method used in Sprinkle and all 3D fluid movies posted earlier on the blog.

The tutorial does also include interesting talks about convex hull creation, physics debugging, constraint solvers and character collision.


Tuesday, June 25, 2013

Low level audio


Audio has been a neglected area of our game engine for a long time. Except for a few tweaks, we have essentially used the exact same code for all our games. A pretty simple channel pool with pitch and volume per channel and an OpenAL back-end for iOS, Windows and OSX and an OpenSL back-end for Android.

OpenAL is quite okay for iOS and OSX, but requires a third-party installation on Windows. That's not a big issue since we're only using Windows for development, but still annoying. A bigger issue is the lack of render-to-file support in almost every Audio API I have seen, including OpenAL and OpenSL. That's a pretty given feature for a game developer, I can't believe there is so little support for it. We always render out gameplay sequences directly from the game engine to use in trailers, teasers, etc. Because we render in high resolution with motion blur there is no way to capture the audio in real-time along with the video, so audio has to be manually added later in a video editing software. Obviously a very time consuming procedure. A better way would be to render out the game audio frame by frame along with the video, but this requires software mixing and direct access to the final mix - something a hardware-centric audio API cannot offer. Does anyone know if OpenAL and OpenSL actually use hardware mixing these days or do they just mix in software anyway?

I decided to write my own software mixer with a very thin hardware abstraction layer per platform. Partly because I would get the render-to-file feature basically for free, but it also opens up the possibility to add in effects like echo and reverb.

I thought for sure that the hardest part of the mixer project would be to write the actual mixer. I couldn't be more wrong.. It turns out researching approrpriate low-level audio interfaces for each platform took way more time. There are at least four or five different APIs on Windows and just as many on OSX, yet it's suprisingly hard to just stream raw PCM data to the left and right speaker. Extremely frustrating, because this is exactly what the driver wants in the end anyway. Why not just offer a simple callback from a high-priority system thread - feedMoreDataPlease(void* buffer, int size)? I get that some people want 3D positioning, Internet streaming and what-not, but that's not a reason to hide low-level access for the rest of us that just want to submit a raw mix to the driver.

Cudos to Apple for exposing exactly this type of interface through the Audio Unit API (part of Core Audio). It can actually be configured to do what I'm asking for, and offers really good latency too - around 10 ms on both OSX and iOS.

Android doesn't seem to offer any viable options to OpenSL through the NDK. Fortunately OpenSL is also reasonably sane when it comes to streaming and offers latency in the 50 ms range (plus some hidden latency within OpenSL or the drivers, unclear exactly how much). It's not perfect, but acceptable. OpenSL is a beast to setup though. Why on earth would anyone design an API like that? And there is very little help available online except for the 600 page reference manual. I finally got it to work and I hope I never ever need to touch it again.

Windows is the only platform where I still haven't found a decent low-level audio API. There is Media Foundation and WASAPI which I haven't looked at because it's Windows 7 and Vista only. DirectSound is probably the most widely used, but it doesn't seem to offer a direct callback. Instead it relies on a user thread to feed new buffers periodically (making it virtually useless for low-latency stuff due to the horrible Windows scheduler). There is also the old waveOut interface in WinMM which at a first glance looks like a perfect match - it offers a callback when the driver needs more data, but here's the catch - you are not allowed to feed audio data from the callback itself because it may cause the process to deadlock! You are supposed to do this from a separate thread and can only commnuicate with certain "safe" system functions. I'm totally ignoring that for the time being, feeding audio data from the callback and it seems to work great in Windows 7 at least (I suppose this is because the waveOut interface is deprecated and wrapped in fifteen layers of user-mode code at this point...). The latency with the waveOut method ended up being in the 30 ms range.

It took some experimenting to get decent performance of the software mixer, but since it all happens on a separate thread I'm not too concerned these days when even mid-range mobile phones have multiple cores...


Thursday, March 28, 2013

Uniformly distributed points on sphere

Sooner or later everybody will need uniformly distributed points on a sphere. There doesn't seem to be a standard method for doing this, so I wrote a very simple iterative algorithm that pushes verts away from each other while continuously normalizing the point data. This will eventually find a stable state where the distance between any two neighboring points are very similar. Performance is terrible but it gets the job done, so only use this for offline stuff. There is also an option for distributing points on a hemisphere (y>0). Set the number of iterations to at least the number of input points for a good distribution.

Source code: uniformpoints.cpp



Sunday, March 24, 2013

Convex Hulls Revisited

I have written about 3D convex hull generation here before. I find it a very appealing problem because it is so well defined and to my knowledge there is no de facto standard algorithm or implementation. I come back to this topic every now and then since I need a good implementation myself.

Quickhull is probably the most popular algorithm, but it is hard to implement in a robust way. The qhull implementation has a somewhat questionable license and more significantly it is a really complex piece of software and contains a bunch of other features. I'm on a quest to create a fast, robust convex hull generator that is free to use and is self-contained in a single cpp file.

I'm currently experimenting with an algorithm based on the support mapping, often used in physics and collision detection. The support mapping for a point cloud for a given direction is the point that is farthest in that direction, which simply means finding the point with maximum dot(dir, point). The supporting point for any direction is guaranteed to be on the convex hull of the point cloud. Very convenient.

The algorithm starts with the smallest possible closed mesh - three vertices connected by two triangles facing in opposite directions. Any three points will do, as long as they are on the convex hull (supporting vertices). Each triangle is then exanded by the supporting vertex in the normal direction. This is done by splitting the triangle into three triangles, all connected to the new vertex. This expansion step may cause the mesh to become concave, so the expansion step needs to be followed by an unfolding step, where convave edges are "flipped" to make the mesh convex again.

Flipping one edge may cause nearby edges to become concave, so this step needs to be repeated until all edges are convex, effectively "untangling" any wrinkles introduced by the expansion. Below is a short clip visualising the construction of a hull through a series of expansion and unfolding steps. For clarity, there is only 12 points and they are all on the convex hull.



The interesting thing about this method is that it is based primarily on topology. Both the expansion and the unfolding step guarantee that the mesh is kept well-defined and closed, so there are no degenerate cases. The only critical computation is how to determine wether an edge is convex or not. I'm still investigating the most robust alternative here. My current one does not deal with all degenerate cases, but I'm pretty sure this can be done.

The algorithm has a number of desirable properties:
  • Can be used with a tolerance in the expansion step for automatic simplification.
  • Relatively easy to implement (mine is about 400 lines of code).
  • Handles co-planar and degenerate input data.
  • The output mesh is built entirely from the input vertices.
  • Can be modified to use any support mapping function, not just point clouds.
I'm not entirely sure this is a novel idea. I'd actually be surprised if it is, given its simplicity, but I haven't seen any references to it before. Please write a comment if you know. I'll get back with more details and a performance comparison later.

Friday, March 8, 2013

Mediocre properties


I think most games use some kind of property system as a way to expose, edit and serialize parameters for game objects. There are of course very many ways to implement it, but here is how I'm currently doing.

Each game object that is big enough to carry properties has a PropertyBag instance. The types of properties should ideally be setup per class, not per object, but that requires extra code and I always strive for keeping the amount of code to a minimum. Hence, the construction of a property bag is done in the object constructor and might look like this:

mProperties.add("density", "1.0");
mProperties.add("color", "1 0 1");

Yes, those are strings. Most game developers don't like them. I do, and I will explain why later. Now, setting up properties for every object this way is both time consuming and memory intensive, and there will most likely be lots of instances with exactly the same property configuration. Therefore, property configurations are cached on a per class class basis:

GameObject::GameObject()
{
if (mProperties.init("GameObject"))
{
mProperties.add("density", "1.0");
mProperties.add("color", "1 0 1");
}
}

So the property definitions are not actually stored in each object, but merely a pointer to a definition which is created by the first object using it. This of course reduces memory overhead by magnitudes. The system also handles inheritance, so a derived class can add more properties to an existing definition, but the amount of data per object to store the whole definition is only a single pointer. In a derived class, the parent constructor will run first, adding properties and then each derived class adding their own in order. The init method detects this by being called several times with different strings.

Now that the property bag is configured I can start reading and writing properties using various get/set methods:

float density = mProperties.getFloat("mass");

This will convert the string "1.0" to a float and return it. If the string isn't numerical, it will still return a valid number (0.0), but issue a warning. I also have some operator overloading to allow immediate conversion from most basic types:

float density = mProperties["mass"];

The system accepts any type conversion on the fly:

mProperties["mass"] = 2.0f;
string str = mProperties["mass"]; // str is now "2.0"

This is all very flexible, but performance is somewhat questionable. I typically cache any property that is used every frame in a member variable. Therefore each game object has a loadProperties() method that gets called whenever something changes. This gives the object a chance to cache a local copy of each performance critical property:

GameObject::loadProperties()
{
mDensity = mProperties["density"];
}

So how much memory is being used? As long as an object doesn't override the default value, nothing is stored per object, hence adding more properties to a class doesn't make objects bigger unless you cache a local copy per instance for performance reasons. The system also supports template values, so in addition to default values, each object can also be assigned a certain template (another pointer). Templates are defined in a simple XML file with key/value combinations that have no knowledge whatsoever about the object type. The same template can even be assigned to property bags of totally different types if wanted.

For instance an object that want to use a more slippery form of rubber can use the rubber template, but override the friction property explicitly and only the new friction value will be stored in the object.

Using strings have two obvious problems: a) It is slow and b) Loose bindings are sensitive to typos. The first issue is seriously overrated IMO. Comparing strings is not as slow as most people claim, especially when using a custom string object with built-in storage. In most cases you only need to compare the first letter of two strings to detect a mismatch. The second issue is more worrying, but the system can easily issue warnings when asking for properties that do not exist. Then at least you will be notified of typos at runtime.

There are several benefits that I think clearly outweigh the negatives. The ability to serialize an object into XML/JSON/YAML/etc using solely the property bag is one of them (the entire level loading/saving code is 50 lines of code, and there is no additional serialization code per class), automatic property editing from a level editor is another. The editor doesn't need to know anything about what is being edited, it just presents all the properties of an object and their values as strings (another 50 lines of code to edit any property of any object in the game). Keeping the property names and their values as strings also allow for trivial scripting integration. We use Lua and have one very useful function to query any property: mgGet("name.property") where name is the object name and property is the property name. The result is always a string, which can then be converted to whatever type seems fit (not even 50 lines of code to access any property of any object in whole game from script).

One improvement I would like to make is how the property values are stored internally. Currently I store them as strings regardless of what they represent, but a more efficient way would of course be to store them in binary form, based on what they represent. The format can be chosen either during initialization or at every set operation.

I currently consider the property bag a blueprint of the object rather than a direct mapping of the object internal state. Hence, if an objects internal state is updated at runtime I don't update the properties at that point. That way I can easily reset an entire level to the state it was at load time. There are benefits of keeping a direct mapping too, so I don't really have a strong opinion as long as it's consistent. A direct mapping certainly puts the property system under more stress. Anyone who tried?

Thursday, January 10, 2013

Development environment


A lot of people have asked about our development environment so I thought I'd write a post about it here. Both Sprinkle an Granny Smith as well as our two titles in development follow roughly the same pattern. I do all day-to-day development and testing on Mac and PC, compiling to native Win32/MacOS applications. Hence the code is cross-platform and mouse input is used to emulate touch input. My preference for actual coding is Visual Studio due to its superior debugging and Intellisense. On the Mac I use Textmate, shell scripts and makefiles. I only use XCode for iOS testing and deployment. I have used the XCode debugger on a few occasions, but it very rarely manages to do any meaningful on-target debugging. This might be due to my broken project files though.. I run Visual Studio through VMWare on the Mac as well, so it's actually only one physical Machine. I used a separate development PC laptop before, but the VMWare solution is far superior in almost every aspect and removes a lot of clutter.

I'm not using any off-the-shelf game engine, but rather a collection of homebrew classes that work well together, so you could say the "engine" is developed specifically for each title. Examples of such classes are strings, streams, input handling, data containers, vector math, scripting, sound, etc. It does not contain any code for actual game objects, rendering pipeline, update loop, etc, so it's rather low level. I personally think this is better than using a game engine in most cases, since you can churn out more performance, and you never hit any artificial boundaries of a third party solution.

I do use certain libraries for specific tasks, such as zlib, TinyXML (just switched to RapidXML), Lua, Box2D, Clipper, etc, but it's very isolated pieces of software that do one thing well. If they wouldn't work for a project they can easily be replaced individually. All libraries are included in the project as source code, no static or dynamic libraries.

The build system is a python script that scans the source tree and outputs a makefile, Visual Studio project or Android ant files. I have not yet tried XCode project file generation because XCode project files (dirs actually) are totally horrible. To be fair Visual Studio project files are horrible too. Actually I don't think I have ever seen an IDE with a sane project file format. Is it really that hard? I should probably try cmake, but it takes time to learn, and doesn't necessarily update when you upgrade your IDE. I generally think that about a lot of things - learning a tool or middleware is often more of an investment than just doing it yourself. Besides, if you do it yourself you gain 100% insight into the inner workings and can fix problems immediately when they show up instead of communicating with support and/or wait for a fix. Anyway.

The desktop binaries read raw assets (XML, jpeg, png, etc) directly from the data folder for convenience while both iOS and Android require asset conversion. The asset conversion script is quite similar to the build system - it scans a directory tree and outputs a makefile. Make will then automatically keep track of assets that need conversion using the file modification date, and it can branch out on multiple threads (using the -j switch) for increased performance. Make is really just as awesome for converting assets as it is for compiling source code. On iOS some images are compressed using texturetool and some are using our custom format called MTX, which is essentially just a way to compress PNG images into JPEG with a separate, compressed alpha channel to save space. The Android version also uses MTX compression as well as general LZ compression on most data files. The APK itself is not compressed so generic asset compression is more important here.

The Android version uses the NativeActivity system available from Android 2.3, so basically the whole game, including setup code is C++. We do have a very thin layer of Java to handle in-app billing and query device capabilities, etc. It's quite remarkable how much of the code is identical between the iOS and Android version. It's really just the setup code, touch input and audio back-end that are completely different.

Sprinkle and Granny Smith both have built-in level editors that are accessible on the Mac and PC version. All level design and graphic assets are done within that editor (except for textures, which are done in Photoshop and Illustrator). Scripting is done in Lua using a regular text editor. That's about it.