The Basics Of Streaming Audio

In previous articles, the basics of representing and playing non-streaming audio are covered. Now we’ll move on to cover streaming audio. This is not an article covering easy function calls to play streaming audio formats. Instead, it’s going to cover the basic fundamental concepts of streaming audio.

Explaining The Diagrams

This page uses diagrams to illustrate examples. Small colored rectangles will represent PCM samples, and horizontally stacked collections of these blocks represent PCM buffers.

An example of a diagram of a PCM array filled with fresh PCM data.

Below is the color-coding that will be used.

The color coding used in the diagrams.

The blue color represents the current PCM being read. The location is the audio data currently playing will be referred to as the audio playhead. It moves from left to right through the PCM buffer. Green samples are audio data that waiting to be played (fresh data). Once the speaker has played the sample, it is considered old data (dirty), and we color it black.

Why Stream Audio?

Here are three reasons why you would want to stream audio:

  1. The audio is from a live source. An example of this would be web conferencing.
  2. The audio is procedural and possibly created from user inputs (e.g., virtual instruments) – and unless you have a crystal ball, you don’t’ know what they’ll do [in the near future] or when they’ll stop playing.
  3. The audio is from a saved file, but we don’t want to spend the space to put the entire file into RAM. This also includes compressed file formats (such as an mp3 or ogg) since the PCM needs to be uncompressed when we play them.

PCM Gets Used Fast

To get an idea of how much RAM would be consumed if we couldn’t stream audio, let’s perform a pretend scenario: We are given an audio source, and we didn’t know how long it will play for. It could be for two minutes, or it could be for two days.

To be conservative, we throw a whopping 4 gigs at it for a PCM buffer played at a sample rate of 44100. Let’s use floating-point PCM samples, so 4 bytes a sample.

\tiny{\frac{4294965097\ bytes}{1} \times \frac{1 \ sample}{4\ bytes} \times \frac{1\ second}{44100\  samples} \times \frac{1\ minute}{60 \ seconds} \times \frac{1 \ hour}{60 \ minutes} = 6.76\ hours}

6.76 hours for 4 gigs of PCM data (we’ll round up and say it’s 7). 7 hours is a lot, but not really. Remember, we’re not talking about any specific game or application, but continuous digital audio in general. There are many situations where someone could stream more than 7 hours of continuous audio. Plus, that’s with 4 gigs; that’s wasteful. We need a way to avoid requiring a PCM buffer allocated to the entire length of the audio being played.

The Cyclical Buffer

We don’t exactly have infinite memory, and even if we had a lot, we wouldn’t want to throw a wasteful amount into our audio systems.

With any sized PCM buffer for playing audio, we will eventually reach the end of the array.
Top) Reading PCM buffer from the beginning.
Middle) Halfway through the PCM buffer.
Bottom) Finished playing the PCM buffer. There are no samples afterward left to play.

So how do we obtain an array for our PCM buffer that can play audio indefinitely?

Here’s an idea, what if we took a regular array in memory,…

A PCM buffer.

… and curved it in on itself. Making it so we have a fixed number of PCM samples, but also, every sample has another sample afterward, and there’s no end to the array.

An array buffer with no end to it.

Maybe, but as soon as we read all the data, we’re just going to end up looping back and reading dirty data. So where does it get the PCM data it’s playing from?

Without writing new data, it doesn’t matter if the buffer never ends.
Left) Starting buffer with fresh PCM data.
Middle) Reading the PCM data, halfway through.
Right) All the data is read. Continuing to read would only replay old audio.

Well, what if we were constantly writing new data in front of the audio playhead? As the speaker plays, the old PCM data that was already played would then be filled will a new PCM for it to play the next time it did a full round-trip around the buffer.

As audio is being read, we write in new data for it to read, making sure it never re-reads old data.

But of course, grabbing an array of memory and somehow bending it into a circle isn’t a thing.
That’s preposterous! This is RAM we’re talking about, not a pretzel!
So we do the next best thing, when we’re done with reading or writing at the end of the array, we loop back to the beginning.

After reading to the end, we restart from the beginning, as if the buffer looped back in on itself like a circle.

This strategy is called a cyclical buffer.

The Race Has Started!

Playing Audio Waits For No Man

In a previous article, it was covered how computers have independent and dedicated audio systems to make sure audio doesn’t have to share compute and timing resources with the rest of the CPU.

Because audio is very time sensitive, it’s given its own processor and timing clock to run independently of the CPU.

This means we’re not explicitly in charge of moving the audio playhead or moving the playhead back to the beginning. Instead, when we call system functions to play audio samples (PCM buffers), there’s an option we can set to tell the audio system to wrap to the beginning when it reaches the end of the buffer and loop indefinitely.
For Unity, this functionality is wrapped in the AudioSource.loop property. For other audio APIs and systems, you’ll have to consult their documentation for details.

But, that means you’re still in charge of refreshing old information in the PCM buffer using the CPU. And since the audio playhead moves at its own speed, this means you’re constantly in a race to write in new information. If you fail to keep up, the playhead will keep plowing through the data and read the old PCM without a second thought. This failure is called a buffer underrun.

Simulated buffer underrun: “If you’re ever played a game and noticed a hiccup in the performance of the application and the sound stutters (usually from loading something new or from an Alt+Tab switch), that is the audio underrunning. Or, if your computer experiences a blue screen of death while your audio and then it kind of just trails with a noise.”

This can also happen to large Kaiju fighting machines if the computer system streaming their PA audio gets gummed up.

Not only that, but you’re using the CPU to write this new data. That’s computing you have to share with everything else, including the OS and other apps. Probably the most reliable way to ensure the writes are done on time is to dedicate a thread to writing the new data.

Demo

Below is a simulation. It helps me reach my quota for interactive content.

Processor Speed
Write Speed
Read Speed

Running Sim Halt Computer Streaming Audio

Reset everything
Restart the sim with the same parameters

The “Processor Speed” option isn’t the CPU speed per se; it’s just a parameter to simulate the CPU doing stuff elsewhere when it’s not writing new PCM data for your application.
It could even be because it’s busy streaming PCM data for another application.

Buffer Size And Latency

There’s a balancing act that needs to be understood. When creating the cyclic PCM buffer, we can choose how large to allocate it – i.e., how many samples does it have?

  • On the one hand, if we have a bigger (cyclic) PCM buffer, it buys us more time before an underrun occurs.
  • But on the other hand, a bigger PCM buffer means more samples are read which also means more time passes before the most recent data is played.

This idea that we have streaming data that we want to be played immediately, but have to wait a short amount of time before we actually hear it, is called audio latency. And a higher latency means a larger time delay. And the more samples the buffer has, the higher the latency.

For non-interactive media, no worries; no one’s going to complain their song took a second to start if it plays fine. But for interactive and real-time media, this causes issues. Some examples:

  • For a virtual instrument, if the musician plays a note, there’s a small delay between when they play the note and when it’s heard. This delay can make it difficult for the musician to play properly.
  • For an internet conference, someone could talk while someone else talks and interrupts them because they didn’t realize someone else was speaking until the latency delay passed.
  • If dynamic streaming audio is synced to real-time graphics, the audio might play late and not sync with the visuals.
It’s a metaphor! Get it? Because we’re talking about a balancing act, and it’s a balance scale that’s,… I’m sure you get it.

And it doesn’t take a large amount of latency to cause these problems, only a fraction of a second. This means a PCM buffer size needs to be chosen that gives you enough freedom to process other things and provides a safety net against buffer underruns but also reduces latency as much as possible.

– Stay strong, code on. William Leu
Explore more articles here.
Explore more audio articles here.