Before the Web Audio API, HTML5 gave us the <audio>
element. It might
seem hard to remember now, but prior to the <audio>
element, our best
option for sound in a browser was a plugin!
The <audio>
element was indeed exciting, but it had a pretty singular focus. It was essentially a video player
without the video, good for long audio like music or a podcast, but ill-suited
for the demands of gaming. We put up with (or found workarounds for) looping
issues, concurrent sound limits, glitches and total lack of access to the sound
data itself.
Fortunately,
our patience has paid off. Where the <audio>
element may have been
lacking, the Web Audio API delivers. It gives us unprecedented control over
sound, and it's perfect for everything from gaming to sophisticated sound
editing. All this with a tidy API that's really fun to use and well supported.
Let's be a little more specific: Web Audio gives you access to the raw waveform data of a sound and lets you manipulate, analyze, distort or otherwise modify it. It is to audio what the canvas API is to pixels. You have deep and mostly unfettered access to the sound data. It's really powerful!
This tutorial is the second in a series on Flight Arcade—built to demonstrate what’s possible on the web platform and in the new Microsoft Edge browser and EdgeHTML rendering engine. Interactive code and examples for this article are also located at Flight Arcade / Learn.
Flight Sounds
Even the earliest versions of Flight Simulator made efforts to recreate the feeling of flight using sound. One of the most important sounds is the dynamic pitch of the engine which changes pitch with the throttle. We knew that, as we reimagined the game for the web, a static engine noise would really seem flat, so the dynamic pitch of the engine noise was an obvious candidate for Web Audio.
Less obvious (but possibly more interesting) was the voice of our flight instructor. In early iterations of Flight Arcade, we played the instructor's voice just as it had been recorded, and it sounded as if it was coming out of a sound booth! We noticed that we started referring to the voice as the "narrator" instead of the "instructor".
Somehow that pristine sound broke the illusion of the game. It didn't seem right to have such perfect audio coming over the noisy sounds of the cockpit. So, in this case, we used Web Audio to apply some simple distortions to the voice instruction and enhance the realism of learning to fly!
In the sections below, I'll give you a pretty detailed view of how we used the Web Audio API to create these sounds.
Using the API: AudioContext and Audio Sources
The first
step in any Web Audio project is to create an AudioContext
object. Some browsers
(including Chrome) still require this API to be prefixed, so the code looks
like this:
var AudioContext = window.AudioContext || window.webkitAudioContext; var audioCtx = new AudioContext();
Then you need
a sound. You can actually generate sounds from scratch with the Web Audio API,
but for our purposes we wanted to load a prerecorded audio source. If you
already had an HTML <audio>
element, you could
use that, but a lot of times you won't. After all, who needs an <audio>
element if you've got
Web Audio? Most commonly, you'll just download the audio directly into a
buffer with an HTTP request:
var request = new XMLHttpRequest(); request.open("GET", url, true); request.responseType = "arraybuffer"; var loader = this; request.onload = function() { loader.context.decodeAudioData( request.response, function(buffer) { if (!buffer) { console.log('error decoding file data: ' + url); return; } loader.bufferList[index] = buffer; if (++loader.loadCount === loader.urlList.length){ loader.onload(loader.bufferList); } }, function(error) { console.error('decodeAudioData error', error); } ); };
Now we have an AudioContext and some audio data. Next step is to get these things working together. For that, we need...
AudioNodes
Just about everything you do with Web Audio happens via some kind of AudioNode, and they come in many different flavors: some nodes are used as audio sources, some as audio outputs, and some as audio processors or analyzers. You can chain them together to do interesting things.
You might think of the AudioContext as a sort of sound stage. The various instruments, amplifiers and speakers that it contains would all be different types of AudioNodes. Working with the Web Audio API is a lot like plugging all these things together (instruments into, say, effects pedals and the pedal into an amplifiers and then speakers, etc.).
Well, in order to do anything interesting with our newly acquired AudioContext audio sources, we need to first encapsulate the audio data as a source AudioNode.
var sourceNode = audioContext.createBufferSource();
Playback
That's it. We
have a source. But before we can play it, we need to connect it to a
destination node. For convenience, the AudioContext exposes a default
destination node (usually your headphones or speakers). Once connected, it's
just a matter of calling start
and stop
.
sourceNode.connect(audioContext.destination); sourceNode.start(0); sourceNode.stop();
It's worth
noting that you can only call start()
once on each source
node. That means "pause" isn't directly supported. Once a source is
stopped, it's expired. Fortunately, source nodes are inexpensive objects,
designed to be created easily (the audio data itself, remember, is in a separate
buffer). So, if you want to resume a paused sound you can simply create a new
source node and call start()
with a timestamp parameter. AudioContext has an
internal clock that you can use to manage timestamps.
The Engine Sound
That's it for
the basics, but everything we've done so far (simple audio playback) could have been done with the old <audio>
element. For Flight
Arcade, we needed to do something more dynamic. We wanted the pitch to change
with the speed of the engine.
That's actually pretty simple with Web Audio (and would have been nearly impossible without it)! The source node has a rate property which affects the speed of playback. To increase the pitch we just increase the playback rate:
throttleSlider.onMove = function(val){ sourceNode.source.playbackRate.value = val; };
The engine sound also needs to loop. That's also very easy (there's a property for it too):
sourceNode.source.loop = true;
But there's a catch. Many audio formats (especially compressed audio) store the audio data in fixed size frames and, more often than not, the audio data itself won't "fill" the final frame. This can leave a tiny gap at the end of the audio file and result in clicks or glitches when those audio files get looped. Standard HTML audio elements don't offer any kind of control over this gap, and it can be a big challenge for web games that rely on looping audio.
Fortunately, gapless audio playback with the Web Audio API is really straightforward. It's just a matter of setting a timestamp for the beginning and end of the looping portion of the audio (note that these values are relative to the audio source itself and not the AudioContext clock).
engineAudioSource.source.loopStart = 0.5; engineAudioSource.source.loopEnd = 1.5;
The Instructor's Voice
So far, everything we've done has been with a source node (our audio file) and an output node (the sound destination we set early, probably your speakers), but AudioNodes can be used for a lot more, including sound manipulation or analysis. In Flight Arcade, we used two node types (a ConvolverNode and a WaveShaperNode) to make the instructor's voice sound as if it's coming through a speaker.
Convolution
From the W3C spec:
Convolution is a mathematical process which can be applied to an audio signal to achieve many interesting high-quality linear effects. Very often, the effect is used to simulate an acoustic space such as a concert hall, cathedral, or outdoor amphitheater. It can also be used for complex filter effects, like a muffled sound coming from inside a closet, sound underwater, sound coming through a telephone, or playing through a vintage speaker cabinet. This technique is very commonly used in major motion picture and music production and is considered to be extremely versatile and of high quality.
Convolution basically combines two sounds: a sound to be processed (the instructor's voice) and a sound called an impulse response. The impulse response is, indeed, a sound file, but it's really only useful for this kind of convolution process. You can think of it as an audio filter of sorts, designed to produce a specific effect when convolved with another sound. The result is typically far more realistic than simple mathematical manipulation of the audio.
To use it, we create a convolver node, load the audio containing the impulse response, and then connect the nodes.
// create the convolver var convolverNode = audioContext.createConvolver(); // assume we've already downloaded the telephone sound into a buffer convolver.buffer = telephponeBuffer; // connect the nodes sourceNode.connect(convolverNode); convolverNode.connect(audioContext.destination);
Wave Shaping
To increase the distortion, we also used a WaveShaper node. This type of node lets you apply mathematical distortion to the audio signal to achieve some really dramatic effects. The distortion is defined as a curve function. Those functions can require some complex math. For the example below, we borrowed a good one from our friends at MDN.
// create the waveshaper var waveShaper = audioContext.createWaveShaper(); // our distortion curve function function makeDistortionCurve(amount) { var k = typeof amount === 'number' ? amount : 50, n_samples = 44100, curve = new Float32Array(n_samples), deg = Math.PI / 180, i = 0, x; for ( ; i < n_samples; ++i ) { x = i * 2 / n_samples - 1; curve[i] = ( 3 + k ) * x * 20 * deg / (Math.PI + k * Math.abs(x)); } return curve; } // connect the nodes sourceNode.connect(convolver); convolver.connectwaveShaper); waveShaper.connect(audioContext.destination); // vary the amount of distortion with the slider distortionSlider.onMove = function(val){ waveShaper.curve = makeDistortionCurve(val); };
Notice the stark difference between the original waveform and the waveform with the WaveShaper applied to it.
The example above is a dramatic representation of just how much you can do with the Web Audio API. Not only are we making some pretty dramatic changes to the sound right from the browser, but we're also analyzing the waveform and rendering it into a canvas element! The Web Audio API is incredibly powerful and versatile and, frankly, a lot of fun!
More Hands-On With JavaScript
Microsoft has a bunch of free learning on many open-source JavaScript topics, and we’re on a mission to create a lot more with Microsoft Edge. Here are some to check out:
- Microsoft Edge Web Summit 2015 (a complete series of what to expect with the new browser, new web platform features, and guest speakers from the community)
- Best of //BUILD/ and Windows 10 (including the new JavaScript engine for sites and apps)
- Advancing JavaScript Without Breaking the Web (Christian Heilmann’s recent keynote)
- Hosted Web Apps and Web Platform Innovations (a deep-dive on topics like manifold.JS)
- Practical Performance Tips to Make Your HTML/JavaScript Faster (a seven-part series from responsive design to casual games to performance optimization)
- The Modern Web Platform Jump Start (the fundamentals of HTML, CSS, and JavaScript)
And some free tools to get started: Visual Studio Code, Azure Trial, and cross-browser testing tools—all available for Mac, Linux, or Windows.
This article is part of the web dev tech series from Microsoft. We’re excited to share Microsoft Edge and the new EdgeHTML rendering engine with you. Get free virtual machines or test remotely on your Mac, iOS, Android, or Windows device @ http://dev.modern.ie/.