The primitives waves are just building blocks. Most audio are not primitive waves, but more complex forms of sound.
This article is a continuation of this article on tone generation. That article covered simple primitive wave types for tone generations. In this article, we’re going to cover non-primitive waves.
Arbitrary Tones
So the previous article covered low-level primitives that are often available on DAWs. But really, any wave can be sent to the speaker to make noise. And if it’s repeating at a periodic rate, it will generate a tone.
The path and shape of these waves is called a waveform. It’s basically a graph of three things that are pretty much all one-and-the-same:
- The electric voltage that will be sent to your speakers to control an electromagnet.
- The movement of the voice coil and diaphragm.
- The compressing and rarefying pressure waves being emitted from the speakers.
The first thing (voltage change) causes the second thing (speaker movement), which causes the third thing.
Fourier Synthesis
Pretty much any waveform pattern can be constructed with an infinite combination of particular sine waves added together. Because infinite sines waves is not practical, a few sine waves can be added as an approximation.
All of the simple primitives previously mentioned, except for the sine wave, are actually kind of deceptive – because the sine wave can reconstruct them. This process is known as Fourier synthesis. Okay, so in theory, so primitive waves can be constructed by multiple sine waves? Actually, for high-end DAWs, they are. This is exactly what they do: construct primitives using Fourier synthesis. It’s generally more expensive – calculating trig functions usually are, let alone a series of them, per audio sample. It’s also technically more correct and solves some issues that stem from having vertical lines in their signal (i.e., an infinite response).
This is an English (American to be specific) article, talking about stuff named after a Frenchman, so just to be safe, I’m going to perform a public service and take a moment and include an audio of all this pronunciation (nativized to English of course). Pronounced foo r-ee-ey:
Supplementary links:
- This Geogebra collection has some animations and interactive demos, reconstructing a sawtooth wave and triangle wave using sinewaves.
- Here’s a Smarter Every Day episode going deeper into the Fourier series.
- Here’s a BlackPawn article on circular harmonics. It’s just the Fourier series, but he does it in the context of being a 2D implementation of spherical harmonics.
- TechTarget talks about Fourier synthesis.
Handling Infinite Responses & Other Issues
The clean diagrams for primitives waves are often not how they’re implemented in practice. Instead, they’re constructed as a combination of sine waves.
A perfectly vertical line in the waveform is called an infinite response because it requires the speaker to move at an infinite speed to be played. It would literally require parts of the speaker to teleport. While obviously, this doesn’t happen in practice, a waveform that has drastic changes can still cause problems such as transients and speakers with different frequency responses sounding inconsistent.
Transients are rapid speaker jumps that can cause speakers to make unintended popping sounds.
In this context, the frequency response is the speaker’s ability to physically reproduce the audio signal at a high quality, taking into consideration this ability across the entire spectrum of human hearing (which is about 20hz to 20,000hz).
This doesn’t happen with just infinite response signals, but other types of signals. In the previous post, illustrations were given for square waves and sawtooth waves and triangle waves, showing their idealized forms, constructed of straight lines. In theory, these straight lines would require an infinite number of sinusoids to be added to be correctly represented. This is impossible, and if we just let to speaker try anyways, this would again cause the audio to be inconsistent based on various speakers’ frequency response. This is also impossible because each sinusoid being added in the Fourier series is an incrementally higher frequency. Eventually, we would be adding frequencies above human hearing to smooth out the finer details of a wave’s shape.
And yes, different speakers of all shapes, sizes, construction, and quality naturally sound different. But the issues being covered aren’t because of the speaker’s different characteristics, but because you’re budding up against practical and physical limitations. But, I’m no audio engineer, so I don’t know how I’m supposed to feel about one issue of variation vs. another. Meh…
Here are some notes about sinusoid constructions:
- Reconstruction of a sawtooth wave in Desmos.
- Reconstruction of a square wave in Desmos. Here’s a better interactive example.
- Reconstruction of a triangle wave in Desmos.
- Wikipedia illustrations on constructing a sawtooth [1][2].
- Wikipedia audio of sine waves being incrementally added to form a sawtooth wave. Note how you can hear the upper partials get ever-higher while keeping its base note.
Supplementary links
- CenterSpeakerZone‘s buyer guide about frequency response.
Construction of New Sounds & Sound Design
The skill of manipulating and creating sounds and tones to have a desired quality and emotional effect is called sound design.
There are a few basic strategies for constructing and manipulating sound:
- Tone generation of a sine wave. We could include other primitives too, but that’s arguably involves additive synthesis if we consider sinusoidal construction.
- Adding existing audio on top of each other. This is called additive synthesis. This includes adding itself to itself (e.g., delay, reverb) as well as adding upper partials to a simpler wave.
- Subtracting existing audio from each other. This is called subtractive synthesis.
- Multiplying audio with each other. This is called amplitude modulation.
- Using one audio signal as a parameter to shift the frequency of another signal. This is know as frequency modulation synthesis.
- Amplifying or suppressing the frequency content of an audio signal. This is called equalization.
Phase Cancellation
So if we have a wave and and inverted wave, and play them on top of each other, what happens?
They cancel each other out you get silence. At least in theory.
This works nicely in theory and in software, where all the math is safely contained in logic and memory. What about in the real-world? Yes, you even get that effect in the real world, although perfect cancellation doesn’t really happen in practice. This is how noise-canceling headphones work. They’ll have a microphone recording the outside world used to generate an inverse signal to dampen the audio leaking into the user’s ear.
Upper Partials and Timbre
Unless a tone is a sine wave, other sine waves of varying volumes added in, these other sine waves will be integer multiples of the base frequency. These are known as overtones, or upper partials – with the original lowest frequency tone being called the fundamental frequency.
These overtones are what give tones their different character. This character, or audio texture, is called timbre. And for the LOVE OF GOD, if you ever pronounce it, pronounce it right – because it’s not pronounced the same way as timber.
It’s pronounced tam-ber. Audio below. It’s not some song sung by Pitbull and Ke$ha.
Examples
So just to beat in the idea like a dead horse, here are a few samples of the same jingle on a single instrument, but with different timbres.
There’s more going on between these samples than just timbre changes, such as complex effect chains, volume envelopes, and LFOs, but I’m sure you get the idea.
Wavelets
Note that the sound we hear in the real world is not just an indefinitely occurring and perfectly repeating tone. It consists of changes to its frequency content over time. A spectrogram is a good example of how to visualize that.
Below are two examples, on the left being a simple tone (A4 sinewave), and on the right being a real-world and unprocessed voice recording. Below the spectrogram 1 is the waveform for the sample 2 and a zoom-in showing the actual wave structure.
Note how the simple A4 sinewave is just a straight line in the spectrogram – while the complex voice recording comprises innumerable contributing frequencies whose presence and contribution wanes in and out over time.
These representations of wave contributions, but only for a tiny amount of time, are known as wavelets. Sometimes, depending on the context, it can also be known as a grain.[microsound]
Ending
Next time we’ll get into how digital audio is represented.
– Stay strong, code on. William Leu
Explore more articles here.
Explore more audio articles here.