Hey, I mentionned that I do video game music in a weird way, here’s how!
Z.A.L. (source code) is a real-time music engine which I’ve been tinkering with the last few years. It’s inspired by Sonic Pi and the beads library, as well as vertical remixing in video game music (such as in Banjo-Kazooie or FTL: Faster Than Light).
What does real-time music engine mean? Basically the engine doesn’t use exported music in any way, so no .mp3 or .ogg files or any of that, instead the music is rendered directly to the audio output, just-in-time™, by having a composition be controlled and interpreted logically (kinda like MIDI) directly in Java source code.
Why have you done this?
Well if music can be made of data, logic and source code, it can be operated on programatically, just like anything else in a video game!
Here are a couple examples from Quest Giver where the music’s tempo, mode and instruments change (fairly) seamlessly when a new character shows up, which is controlled by how fast the player advances through the dialog.
Here the king get’s his own chord:
Here two different characters get the same song played differently:
Here, the music changes modes and tempos:
(here are links to the clips in case I goofed up the video embeds :/ clip1 clip2 clip3)
How does it work?
Indeed, how does any of this work? As you might expect it’s basically a very simple digital synthesizer. Albeit, one that plays automatically and responds to events from within a game.
Since it’s automatic and not played by hand on a keyboard, it needs a musical composition to follow!
I write the compositions in Java. The files end up kinda like bad sheet music. Here’s a simplified snippet of one:
public DragonSong() {
// Sets the song's initial speed (1.75 beats per second, i.e. 105 beats per minute)
super(1.75f);
// Create our main track for the song, and wrap it in a decorator that can optionally create chords
RepeatingTrack main = new RepeatingTrack(Instruments.PULSE_GUITAR);
ChordDecorator mainChordifier = new ChordDecorator(main);
addTrack(mainChordifier);
// intro sequence
// we just put the notes in, in the order we want!
main.addInstant(HALF, C4);
main.addInstant(QUARTER, D4);
main.addInstant(QUARTER, G4);
main.addInstant(QUARTER, F4);
main.addInstant(QUARTER, E4);
main.addInstant(HALF, C4);
// the song will have it's main loop begin here since the intro sequence is only played once
main.setRepetitionPoint();
//chorus
main.addInstant(EIGHTH, C4);
main.addSilence(EIGHTH);
main.addInstant(EIGHTH, D4);
main.addInstant(EIGHTH, E4);
main.addInstant(EIGHTH, F4);
main.addInstant(EIGHTH, E4);
main.addInstant(EIGHTH, D4);
main.addSilence(EIGHTH);
// .... (notes edited for brevity)
}
// This game specific function is called whenever a new character talks to you in the game
public void handleSpeakerChanged(Time newTime, Speaker newSpeaker) {
// On the final day of the dragon quest, we'll speed up the music to give it a sense of urgency
if (newTime.day == SlayDragon.finalQuestDay) {
addMeasureRunnable(() -> {
setTargetBPS(2.5f);
});
}
// Whenever the king is speaking, we'll add an octave overtone,
// so if we were playing a middle C, it would also play the C an octave higher while he is on screen
if (newSpeaker == Speaker.KING) {
mainChordifier.setSimpleChordFunction(pitch -> pitch.octaveUp());
}
else {
mainChordifier.resetChordFunction();
}
}
Now that we have our basic composition, the music engine just plays through it continuously, using a lot of math, by:
- Figuring out which notes in the song are currently being played (i.e. where are we in the song)
- Assembling those notes into little packages with their pitches, amplitudes, envelopes and synths
- Sampling them into PCM waveforms, represented as float arrays (it’s like a little .WAV snippet)
- Combining and sending these float arrays to the audio device to be played!
And that’s it, (hopefully) beautiful music is being played directly into your ear holes.
Wait, is that even feasible!?
Rendering music on the fly seems like a performance and synchronization nightmare, so let’s talk numbers!
All of this is running alongside a video game, in Java, on the same CPU as the game, but in it’s own thread. The music engine steps forward 60 times per second, with a 44100 hz sample rate, possibly in stereo. Is this remotely reasonable?
Well it ends up actually being unusually reasonable. Math time:
We’re sending float arrays to the audio device to play music, but how much data are we actually computing and sending? Well, we’re sampling at 44100hz, so that’s 44100 floats per second, double that for stereo music, and each float is 4 bytes, which gives us 44100 samples x 2 (stereo) x 4 bytes = 352 800 bytes per second, or roughly 350 kilobytes/second. And of course those 44100 x 2 = 88200 floats need to be computed every second too. Gosh that sounds like a lot of data processing.
But is it? Let’s compare with the typically heaviest part of a video game, graphics! What is the raw graphical bitrate of a game in standard conditions?
Well, let’s say were playing in 1080p (1920 x 1080 = 2 073 600 pixels), at 60 frames per second, with 32 bit color depth (8 bits for each of RGBA), we get:
2 073 600 pixels x 60 frames per second x 4 bytes per pixel = 497 664 000 bytes per second, or roughly 500 megabytes/second! That’s over 1000 times more throughput than the audio, phew! Now you know why gamers™ talk so much about graphics card performance!
So anyway, it turns out that Java is pretty fast these days and CPUs are extremely fast these days (there’s a reason computers don’t have sound cards anymore, it seems cd-quality music just isn’t a strain on a modern-ish CPU). As a rough practical benchmark on my 2015 laptop, when I run the engine in export mode (playing the music into .wav sound files as fast possible, with simulated game events), it can export a 20 minute album in 10 seconds, and so is using less than 1% of a CPU core for rendering the audio!
A final neat side effect is that by storing the music for the game logically in source code, the music adds only 100~200 kilobytes to the final size of the game, instead of 100+ megabytes!
Some minor caveats
- This is for Desktop audio processing. I tried this on web using gwt to compile to Javascript and it was laggy and ear-splitting. I don’t know if it’s the single-threaded nature of Javascript, or something more fundamental to browsers, but oof. I would also expect similar issues with Android/IOS processing, since I believe their audio latency is much higher, in the 100~200 millisecond range (my laptop audio has a ~20ms latency), so realtime could be challenging.
- This is a crude synthesizer at best, it will never approach the quality or flexibility of digital audio workstations, especially since I’m neither a trained audio programmer nor a trained composer. But! Games are interactive. If graphics can react to the player instantaneously, why shouldn’t the music?
- For technical reasons, the music engine needs to run in it’s own thread. Not much of an issue since all CPUs have multiple cores these days, but it makes the programming a little trickier. That said it probably insulates it from the actual video game’s CPU usage.
- If one actually does want music files, say to make your game’s OST into an album, you have to go out of your way to orchestrate a version of each song for non-looping playback (and make scripted simulated game events if you want to showcase those changes in the album).
What’s next?
The engine works (I still can’t believe it honestly), but it’s pretty basic with what kind of music it can make (very chiptuney sounding, I know), but there is hope! I’ve been experimenting with FM synthesis and full resynthesis to have more nuanced and complex synths/instruments. To actually make nice music I mostly need to practice composing more, and learn some music theory haha.
Anyway that’s where it’s at for now. Any thoughts, suggestions or criticism is very appreciated! Also, is this an insanely convoluted way to make video game music!? Let me know! :3