Music that is capable of changing dynamically and seamlessly to reflect what is happening on-screen can add a whole new level of immersion to a game. In this tutorial we take a look at one of the easier ways to add responsive music to a game.
Note: Although this tutorial is written using JavaScript and the Web Audio API, you should be able to use the same techniques and concepts in almost any game development environment.
Demo
Here’s a live responsive music JavaScript demo for you to play with (with downloadable source code). You can watch a recorded version of the demo in the following video if your web browser cannot run the live demo:
Important Note: At the time of writing this tutorial, the W3C Web Audio API (used by the JS demo) is an experimental technology and is only available in the Google Chrome web browser.
Introduction
Journey, a game developed by thatgamecompany, is a good starting point for this tutorial. The game’s graphics and music fuse together to create a stunning and emotional interactive experience, but there is something special about the music in the game that makes the experience as powerful as it is – it flows seamlessly through the entire game, and evolves dynamically as the player progresses and triggers certain in-game events. Journey uses ‘responsive’ music to enhance the emotions the player experiences while playing the game.
To be fair, a lot of modern games do use responsive music in one way or another – Tomb Raider and Bioshock Infinite are two examples that spring to mind – but every game can benefit from responsive music.
So how can you actually add responsive music to your games? Well, there are numerous ways of achieving this; some ways are a lot more sophisticated than others and require multiple audio channels to be streamed from a local storage device, but adding some basic responsive music to a game is actually quite easy if you have access to a low-level sound API.
We are going to take a look at one solution that is simple enough, and lightweight enough, to be used today in online games – including JavaScript based games.
In a Nutshell
The easiest way to achieve responsive music in an online game is by loading a single audio file into memory at runtime, and then programmatically looping specific sections of that audio file. This requires a coordinated effort from the game programmers, sound engineers, and designers.
The first thing we need to consider is the actual structure of the music.
Music Structure
The responsive music solution that we are looking at here requires the music to be structured in a way that allows parts of the musical arrangement to be looped seamlessly – these loopable parts of the music will be called ‘zones’ throughout this tutorial.
As well as having zones, the music can consist of non-loopable parts that are used as transitions between various zones – these will be called ‘fills’ throughout the remainder of this tutorial.
The following image visualises a very simple music structure consisting of two zones and two fills:
If you are a programmer who has used low-level sound APIs before, you may have already worked out where we are going with this: if the music is structured in such a way that it allows parts of the arrangement to be looped seamlessly, the music can be programmatically sequenced – all we need to know is where the zones and fills are located within the music. That’s where a descriptor file comes in useful.
Note: There must not be any silence at the beginning of the music; it must begin immediately. If there is a random chunk of silence at the beginning of the music the zones and fills in the music will not be aligned to bars (the importance of this will be covered later in this tutorial).
Music Descriptor
If we want to be able to programmatically play and loop specific parts of a music file, we need to know where the music zones and fills are located within the music. The most obvious solution is a descriptor file that can be loaded along with the music, and to keep things simple we are going to use a JSON file because most programming languages are capable of decoding and encoding JSON data these days.
The following is a JSON file that describes the simple music structure in the previous image:
{ "bpm": 120, "bpb": 4, "structure": [ { "type": 0, "size": 2, "name": "Relaxed" }, { "type": 0, "size": 2, "name": "Hunted" }, { "type": 1, "size": 1, "name": "A" }, { "type": 1, "size": 1, "name": "B" } ] }
- The
bpm
field is the tempo of the music, in beats per minute. - The
bpb
field is the signature of the music, in beats per bar. - The
structure
field is an ordered array of objects that describe each zone and fill in the music. - The
type
field tells us whether the object is a zone or a fill (zero and one respectively). - The
size
field is the length or the zone or fill, in bars. - The
name
field is an identifier for the zone or fill.
Music Timing
The information in the music descriptor allows us to calculate various time related values that are needed to accurately play the music through a low-level sound API.
The most important bit of information we need is the length of a single bar of music, in samples. The musical zones and fills are all aligned to bars, and when we need to transition from one part of the music to another the transition needs to happen at the start of a bar – we don’t want the music to jump from a random position within a bar because it would sound really disconcerting.
The following pseudocode calculates the sample length of a single bar of music:
bpm = 120 // beats per minute bpb = 4 // beats per bar srt = 44100 // sample rate bar_length = srt * ( 60 / ( bpm / bpb ) )
With the bar_length
calculated we can now work out the sample position and length of the zones and fills within the music. In the following pseudocode we simply loop through the descriptor’s structure
array and add two new values to the zone and fill objects:
i = 0 n = descriptor.structure.length // number of zones and fills s = 0 while( i < n ) { o = descriptor.structure[i++] o.start = s o.length = o.size * bar_length s += o.length }
For this tutorial, that is all of the information we need for our responsive music solution – we now know the sample position and length of each zone and fill in the music, and that means are now able to play the zones and fills in any order we like. Essentially, we can now programmatically sequence an infinitely long music track at runtime with very little overhead.
Music Playback
Now that we have all of the information we need to play the music, programmatically playing zones and fills from the music is a relatively simple task, and we can handle this with two functions.
The first function deals with the task of pulling samples from our music file and pushing them to the low-level sound API. Again, I’ll demonstrate this using pseudocode because different programming languages have different APIs for doing this kind of thing, but the theory is consistent in all programming languages.
input // buffer containing the samples from our music output // low-level sound API output buffer playhead = 0 // position of the playhead within the music file, in samples start = 0 // start position of the active zone or fill, in samples length = 0 // length of the active zone or fill, in samples next = null // the next zone or fill (object) that needs to be played // invoked whenever the low-level sound API requires more sample data function update() { i = 0 n = output.length // sample length of the output buffer end = length - start while( i < n ) { // is the playhead at the end of the active zone or fill if( playhead == end ) { // is another zone or fill waiting to be played if( next != null ) { start = next.start length = next.length next = null } // reset the playhead playhead = start } // pull samples from the input and push them to the output output[i++] = input[playhead++] } }
The second function is used to queue the next zone or fill that needs to be played:
// param 'name' is the name of the zone or fill (defined in the descriptor) function setNext( name ) { i = 0 n = descriptor.structure.length // number of zones and fills while( i < n ) { o = descriptor.structure[i++] if( o.name == name ) { // set the 'next' value and return from the function next = o return } } // the requested zone or fill could not be found throw new Exception() }
To play the ‘Relaxed’ zone of music, we would call setNext("Relaxed")
, and the zone would be queued and then played at the next possible opportunity.
The following image visualises the playback of the ‘Relaxed’ zone:
To play the ‘Hunted’ zone of music, we would call setNext("Hunted")
:
Believe it or not, we now have enough to work with to add simple responsive music to any game that has access to a low-level sound API, but there is no reason why this solution needs to remain simple – we can play various parts of the music in any order we like, and that opens the door to more complex soundtracks.
One of the things we could do is group together various parts of the music to create sequences, and those sequences could be used as complex transitions between the different zones in the music.
Music Sequencing
Grouping together various parts of the music to create sequences will be covered in a future tutorial, but in the meantime consider what is happening in the following image:
Instead of transitioning directly from a very loud section of music to a very quiet section of music, we could quieten things down gradually using a sequence – that is, a smooth transition.
Conclusion
We have looked at one possible solution for responsive game music in this tutorial, using a music structure and a music descriptor, and the core code required to handle the music playback.
Responsive music can add a whole new level of immersion to a game and it is definitely something that game developers should consider taking advantage of when starting the development of a new game. Game developers should not make the mistake of leaving this kind of thing until the last stages of development, though; it requires a coordinated effort from the game programmers, sound engineers, and designers.