Streaming Microphone Audio To Unity WebGL

This is going to cover the steps I used to get the microphone working for the granulation demo.
The GitHub repo for that demo can be found here.

NOTE: Before reading further, please note that this only covers recording the microphone from the browser’s Unity WebGL app. It does NOT cover streaming audio playback or internet transmission or audio!

A caveat before we begin, the content here is not going to be a robust solution, not only because it’s just a sample to point people in the right direction and get them started – but also because the behavior is finicky.

The Process In A Nutshell

So what are the individual parts involved for recording from the microphone and transferring the data to Unity, and playing that audio in Unity? Let’s list the things that need to happen:

  1. The machine needs to have a microphone attached, and the browser needs to support the media API we’re using.
  2. Javascript needs to request the use of the microphone.
  3. The user needs the accept the request, allowing the browser to allow the webpage’s Javascript permission to do the requested microphonalyzing (technical term).
  4. While the microphone is active, we need a way to listen in on the streaming audio data (PCM data) from the microphone.
  5. We need to transfer this PCM data from JavaScript to the Unity application.
  6. (Optional) If we want to play the audio back in the Unity WebGL application, we then need to shove the PCM data into an AudioClip, shove that AudioClip into an AudioSource, and then we can play the AudioSource. Although we can’t do this with streaming data, because as mentioned before, streaming audio data isn’t supported in Unity WebGL.

Setting Up Unity To Receive Streaming Data

We’ll start with the Unity class that will wrap the mic.

For brevity, some code is omitted: some extra state-keeping, utilities, and preprocessor branching, so the WebMic class can also wrap the Microphone class when the application is not running in the browser.
You know, the Microphone class. The thing that’s missing entirely from the WebGL builds, which is why we’re here.

// C# MonoBehaviour singleton - we're going to assume it's in a GameObject called "Managers"

public class WebMic : MonoBehaviour
{
    ...
    
    [DllImport("__Internal")]
    public static extern void Recording_Start();    // Calls JS StartMic

    [DllImport("__Internal")]
    public static extern void Recording_Stop();     // Calls JS StopMic
    
    [DllImport("__Internal")]
    // Consume PCM buffer and allocate new buffer.
    public static extern bool Recording_UpdatePointer(float [] idx);
    
    public const int FreqRate = 44100; // Hardcoded for example
    
    public enum State
    { 
        Booting,
        NotActive,
        Recording
    }
    
    // The destination for the next streaming PCM
    private struct FloatArray
    { 
        public float [] buffer;
        public int written;         // The amount of this.buffer that's usable.
    }
    
    const int BufferSize = 2048; // Needs to match Javascript
    
    // PCM we've had streamed to us but haven't processed.
    private List<FloatArray> binaryStreams = new List<FloatArray>();
    
    FloatArray currentBuffer;
    
    State recordingState = State.NotActive;
    
    // ...  There's stuff we're glossing over in this class
    
    public void Awake()
    {
        this.currentBuffer = new FloatArray();
        this.currentBuffer.buffer = new float[BufferSize];
        Recording_UpdatePointer(this.currentBuffer.buffer);
    }
    
    // Called from JS to notify us that currentBuffer.buffer has been written to,
    // it should be process or stored, and a new PCM buffer should be prepared
    // in this.currentBuffer's place.
    public void LogWrittenBuffer(int written)
    { 
        if(this.recordingState != State.Recording)
            return;

        this.currentBuffer.written = written;
        this.binaryStreams.Add(this.currentBuffer);

        this.currentBuffer = new FloatArray();
        this.currentBuffer.buffer = new float[BufferSize];
        Recording_UpdatePointer(this.currentBuffer.buffer);
    }
    
    public void NotifyRecordingChange(int newRS)
    { 
        if((int)this.recordingState == newRS)
            return;

        State oldState = this.recordingState;
        this.recordingState = (State)newRS;
    }
    
    public bool StartRecording()
    {
        if(this.recordingState != State.NotActive)
            return false;

        this.recordingState = State.Booting;

        Recording_Start();

        this.RecordingClip = null;
        return true;
    }

    public void StopRecording()
    {
        Recording_Stop();
    }
}

Well, there’s a lot to unpack here. The most important things to note are the Recording_UpdatePointer plugin function, the FloatArray struct, and the currentBuffer variable.

// Excerpts from above code

[DllImport("__Internal")]
// Consume PCM buffer and allocate new buffer.
public static extern bool Recording_UpdatePointer(float [] idx);

...

// The destination for the next streaming PCM
private struct FloatArray
{ 
    public float [] buffer;
    public int written;         // The amount of this.buffer that's usable.
}

...

FloatArray currentBuffer;

These things are used to stage floating arrays that the JavaScript can stream data directly to Unity. The FloatArray struct has a float array (float[]) in it – in memory that’s stable in heap memory (until it’s garbage collected).
By “stable,” I mean its position in memory won’t change.

The currentBuffer has the float[] that the JavaScript microphone code can fill data into, and Recording_UpdatePointer provides the pointer to currentBuffer‘s buffer to JavaScript.

After the JavaScript microphone code has filled that data, it calls LogWrittenBuffer() to tell our Unity app that the PCM has been filled and a new PCM destination needs to be prepared for the next time we stream more data.

The Unity WebGL Plugin Code

Here’s the Unity WebGL plugin code to define those externed functions.

mergeInto(LibraryManager.library, 
{

	Recording_Start: function () { StartMic(); },
	Recording_Stop: function() { StopMic();},

	Recording_UpdatePointer: function(idx)
	{
		floatPCMPointer = idx;
	}
})

There’s some simple relaying of function calls for Start and Stop – nothing to write home about.

And then there’s Recording_UpdatePointer, which takes in a float array and sets it to the Javascript global variable, floatPCMPointer. The value we receive for the float array, idx, will actually be a number; it’s the memory pointer for the float[] buffer in the web assembly heap.

Now when we prepare a destination for JavaScript to write PCM data to, it will know where to write.

The Webpage Code

I’ll gloss over most of the microphone setup code because it’s a modified version of this Mezintaou blog post – and his blog post explains the microphone listening code well.

var audioContext = null;
var audioInput = null;
var microphone_stream = null;
var recorder = null;

// The pointer to write PCM data directly into the Unity app's
// memory heap.
var floatPCMPointer = -1;

// These need to match the enums in the WebMic.cs file
const MicState =
{
	Booting: 	0,
	NotActive: 	1,
	Recording: 	2
}

// Starts recording from a microphone.
// called from WebMic.jslib's Recording_Start()
function StartMic()
{
	unityInstance.SendMessage("Managers", "NotifyRecordingChange", MicState.Booting);
	
	if (!navigator.getUserMedia)
	{
		navigator.getUserMedia =
			navigator.getUserMedia ||
			navigator.webkitGetUserMedia ||
			navigator.mozGetUserMedia ||
			navigator.msGetUserMedia;
	}

	if (navigator.getUserMedia)
	{
		navigator.getUserMedia(
			{ audio: true },
			function (stream)
			{
				start_microphone(stream);
			},
			function (e)
			{
				unityInstance.SendMessage("Managers", "NotifyRecordingChange", MicState.NotActive);
				alert('Error capturing audio.');
			}
		);
	}
	else
	{
		alert('getUserMedia not supported in this browser.');
		unityInstance.SendMessage("Managers", "NotifyRecordingChange", MicState.NotActive);
	}
}

// Callback worker for StartMic().
function start_microphone(stream)
{
    // Make sure sample rate matches Unity's
	audioContext = new AudioContext({"sampleRate": 44100});
	microphone_stream = audioContext.createMediaStreamSource(stream);

	const bufferSize = 2048;			// This must agree with the buffer size for WebMic
	const numberOfInputChannels = 1;	// Mono audio
	const numberOfOutputChannels = 1;	// Don't care about this
	if (audioContext.createScriptProcessor) 
	{
		recorder = audioContext.createScriptProcessor(bufferSize, numberOfInputChannels, numberOfOutputChannels);
	} 
	else 
	{
		recorder = audioContext.createJavaScriptNode(bufferSize, numberOfInputChannels, numberOfOutputChannels);
	}

	recorder.onaudioprocess = function (e) 
	{
		dstPtr = floatPCMPointer;
		floatPCM = e.inputBuffer.getChannelData(0);
		unityInstance.SendMessage("Managers", "LogWrittenBuffer", floatPCM.length);

		writeTarg = new Float32Array(unityInstance.Module.HEAP8.buffer, dstPtr, bufferSize);
		writeTarg.set(floatPCM);
	}

	// we connect the recorder with the input stream
	microphone_stream.connect(recorder);
	recorder.connect(audioContext.destination)

	unityInstance.SendMessage("Managers", "NotifyRecordingChange", MicState.Recording);
}

// called from WebMic.jslib's Recording_Stop()
function StopMic()
{
	if(audioContext == null)
		return;
		
	recorder.disconnect(audioContext.destination);
	microphone_stream.disconnect(recorder);
	
	audioContext = null;
	recorder = null;
	microphone_stream = null;

	unityInstance.SendMessage("Managers", "NotifyRecordingChange", MicState.NotActive);
}

There a few extra things. The first thing being the MicState enum. This is used in conjunction with Unity to allow the web app to be notified if the microphone state has changed from the JavaScript side. Since we’re piggybacking off JavaScript for the microphone, it holds the recording state, so the JavaScript is in charge of syncing that state with the Unity app.

Besides a microphone state for recording or not-recording, there’s also a third state, Booting; which essentially means “I’m about to start recording, not actually recording yet but it’s about to happen! But also, for whatever reason things could just fail”. Except with shorter terminology.

Transfering the PCM To the Unity WebGL Application

Here’s a snippet from the above code where the magic happens – where audio is streamed to Unity.

recorder.onaudioprocess = function (e) 
{
	dstPtr = floatPCMPointer;
	unityInstance.SendMessage("Managers", "LogWrittenBuffer", floatPCM.length);
	
	floatPCM = e.inputBuffer.getChannelData(0);
	writeTarg = new Float32Array(unityInstance.Module.HEAP8.buffer, dstPtr, bufferSize);
	writeTarg.set(floatPCM);
}

The first part caches the original floatPCMPointer and tells Unity to store the PCM buffer. It’s a bit premature because we haven’t written to it yet, but the float[] is on the heap, in stable memory, so it’s not going anywhere. This means we need to reference dstPtr instead of floatPCMPointer for the rest of the data transfer.

writeTarg = new Float32Array(unityInstance.Module.HEAP8.buffer, dstPtr, bufferSize);
writeTarg.set(floatPCM);

And now for the crust of the biscuit, writing directly to Unity HEAP memory. For a Unity WebGL instance (these days named unityInstance in the sample build by default) there are several large arrays called HEAP*. They represent the actual binary data that makes up the app’s heap. While there are several different typed HEAP* members, they all point to the same underlying byte array of memory, buffer, which is an ArrayBuffer.
To reiterate, HEAP*.buffer is not some buffer for transferring data between JavaScript and Unity; it’s literally the memory for the Unity web assembly program. We will be writing values directly to our float array in currentBuffer.

We make a Float32Array with that buffer, but offset where it starts referencing ArrayBuffer‘s memory to our float[] pointer. Then we call its set() member on the microphone PCM stream (which is also a Float32Array) to transfer all its contents to the Unity app’s heap.

And afterward, we don’t need to worry about setting the pointer and allocating the buffer for the next streaming PCM because we already did that at the start of the function.

Playing The Audio In Unity

When I need to play the audio in Unity, I concatenate every PCM array I have and shove it into an AudioClip. To demonstrate this, we’re going to add more stuff back into WebMic to implement FlushDataIntoClip().

public class WebMic : MonoBehaviour
{
    // ...
    
    public bool ClearRecording()
    {
        if (this.binaryStreams.Count == 0)
            return false;

        this.binaryStreams.Clear();

        return true;
    }
    
    // Concatenate all binary streams to a single array.
    public float [] GetData(bool clear = true)
    { 
        int fCt = 0;
        foreach(FloatArray fa in this.binaryStreams)
            fCt += fa.written;

        float [] ret = new float[fCt];

        int write = 0;
        foreach(FloatArray fa in this.binaryStreams)
        { 
            System.Buffer.BlockCopy(fa.buffer, 0, ret, write * 4, fa.written * 4);
            write += fa.written;
        }

        if (clear == true)
            ClearRecording();

        return ret;
    }

    AudioClip FlushDataIntoClip()
    {
        float[] pcm = this.GetData();
        if (pcm != null && pcm.Length > 0)
        {
            AudioClip ac = AudioClip.Create("", pcm.Length, 1, FreqRate, false);
            ac.SetData(pcm, 0);
            return ac;
        }
        return null;
    }

As mentioned before, a big issue with Unity WebGL audio is that streaming audio isn’t supported. So while we’re receiving streaming microphone input, we can’t have streaming audio output.

So why support the microphone at all in your Unity web app if you can’t stream it back out to audio? I dunno; that’s for you to figure out. But, if you need to hear the streaming audio in realtime, you can easily set up the speakers to play the microphone audio in JavaScript when setting up the AudioContext.

Issues and Additions

A lot of fundamental querying and parameterization available with the normal Microphone class is missing. This could be added and wrapped. While it’s currently out of scope for me, I was eyeballing this UnityWebGLMicrophone class, which looks like it already has code to implement those things.

When attempting to record from the microphone, if a mic isn’t attached, if permission is rejected, or the microphone is unavailable, the request will fail. A proper implementation should check for these things.

And lastly, like all things Unity WebGL audio-related, things are just finicky and not exactly stable.

– Stay strong, code on. William Leu

Explore more articles here.
Explore more Unity WebGL articles here.