Transferring Binary Files Between Unity and Browser

In a recent post, I had a font parsing Unity WebGL application receive binary data, either by hitting an “Upload” button that would bring up a file selection dialog for upload or by dragging and dropping a file onto the app in the browser. These True Type Font (*.TFF) files are binary – so how were they transferred from a user’s computer to the Unity WebGL app?

In previous posts, it was covered how to transfer a text file from the browser to a Unity web app, and how to transfer a file from the Unity web app to the browser.
When saying “the browser”, we’re talking about both the webpage outside of the Unity app, as well as the file system the browser is running on – i.e., uploading and downloading files.

But, that post was dealing with XML data that was represented as a string. In the Unity WebGL plugins documentation, we see examples of integers and strings being passed between the webpage and Unity app. What about binary data?

TL;DR;

This length article has a lot of prerequisites bloat, so if you want to know the answer and have the technical savvy and prerequisite knowledge to figure out the rest, here’s the answer in a nutshell so you can skip the rest (spoiler alert!):
To pass binary file data, use normal string parameters as Unity WebGL plugin function – Base64 encode the binary data into strings and pass those string values. Then decode back to binary on the other side.

Text vs Binary

Before going further, I want to talk a little about the difference between text data and binary data. If you’re familiar with the differences then skip this section.

In the end, everything is actually binary data – or specifically, everything is broken down and represented as a chain of bits, grouped into clusters of 8 (a byte). A byte, when unsigned (i.e., when we don’t spend a bit to differentiate between positive and negative numbers), can represent integer values between 0 and 255. We can cluster bytes together to represent even larger numbers or interpret them differently to represent other types of numbers of data.

Text formats like ASCII, or UTF encoding, map a number (from a byte, or multiple bytes) to a character code.

File:ASCII-Table.svg
An ASCII chart that maps number values for a byte into a character value (the “Char” column). Most characters are tied to a graphical letter or punctuation symbol called a glyph. Some are invisible but represent some kind of spacing, command, or event. Note how even though a byte can represent up to 255 values, only 127 values are used.
Image from Wikimedia Commons.

So what’s all the hullabaloo about text data being different if it’s just a string of binary like any other chunk of data? Well, if it’s text data, there are some caveats, and we give it some special treatment; which causes a functional breakdown if we treat non-text data like text or vice versa:

  • Not all binary numbers may be tied to a character code – this is especially important if we end up wanting to map a number value to a text code and then later remap it back to the original number value; there are holes in this 2-way mapping scheme.
  • A number value of 0 is very common in binary – but for text, it is illegal or represents null/void in some schemes, or a sentinel (i.e., the end of a text string).
  • A newline and return carriage are significant values for text data. Text data streams and file streams treat these characters as an important value to stop on (a mid-string sentinel, if you will), while binary interpreters see these values like any other numbers.
    • These are invisible characters created when you press the ENTER or RETURN key in a word processor that signals to move the text cursor to the start of the next line.
    • The return carriage is the ASCII code 13, and the newline character is ASCII code 10.
    • In programming, you may recognize these as the escaped ‘\r‘, '\n', and "\r\n" characters.

These factors make it incorrect to use a binary stream or interpretation for text data, and a text stream or interpretation for binary data.

Base64

There’s a standard that allows representing binary data as text strings, called Base64. It’s an encoding scheme that’s used in a lot of standard JavaScript APIs that deal with data streams. I won’t delve too much into the weeds, but the basics are:

  • Any arbitrary binary data can be Base64 encoded into an ASCII string.
  • The string can be decoded back into the original binary data.
  • The string will be about 1/3rd larger than the original binary data. So for example, a 1-megabyte binary array would be encoded into ~1.3 megabytes (ASCII) string.
  • The string encoding only use characters codes for ASCII capital letters ('A''Z'), lowercase letters ('a''z'), numbers ('0''9') and the '+','/' and '=' symbols.

Why go through the trouble of turning binary data into an entirely different form that’s more bloated, only to turn it back into binary data later?

  • You can explicitly use it in text formats like HTML. For example, the img tag, instead of specifying the source file path of an image, you can specify the binary content as a Base64 string. You cannot only do this in a text editor, but this also allows JavaScript code to modify images in the DOM the same way.
  • For security or data restriction reasons, systems may not allow you to store data on their server or database unless it’s in a string form. This allows for a loophole of storing binary data but in string form.
    • Just spitballing, but I’m guessing if you limit the data to only a small set of characters, you greatly decrease a person’s ability to upload data that’s secretly arbitrary malware code for stuff like buffer overrun exploits.

The Solution Is Simple

So we can transfer string data from/to a Unity web app, from/to the browser, and want to transfer binary data. And we know that we can freely convert between binary and Base64 which is a string. The solution is staring us in the face:
We pass around strings that are Base64 encoded binary data!

Retrieval and Conversion

So it’s established that we’re going to use normal string transferring between Unity and the browser to transfer binary data between them – and to encode the binary data as a string we’re going to use the Base64 scheme. Let’s cover how we would convert back-and-forth for various situations.

Unity Encoding and Decoding

Encode binary data into a Base64 string.

To encode binary data to Base64 string to send to the browser:

  • Convert your data to a byte array (byte[]).
  • And then convert that byte array to a Base64 string with Convert.ToBase64String.
  • Call your WebGL plugin function that calls externed JavaScript with the string as a parameter, and do the standard Pointer_stringify().

I haven’t looked into turning base64 back into binary data (by this I mean a uint8array). One might think atob() could be used – but that’s as useful as btoa (which is not useful at all). Although there are some things in HTML5 that can take Base64 string directly, like the img tag’s source attribute mentioned earlier.
Pretty much any HTML multimedia tag that takes in a file path asset as a source can take in base64 strings instead (imgs, audios, svgs, etc.).

Getting The Browser To Download a Base64 String As a Binary File

If we have data in our Unity WebGL app that we want the web browser to download, we can do a similar process used for downloading text assets from the WebGL app. First, a link is created with the file contents embedded into it. And then we simulate a click that coerces the browser to download it.

// Unity jslib plugin.

mergeInto(LibraryManager.library, 
{
    BrowserTextDownload: function(filename, base64Data)
    {
        // https://ourcodeworld.com/articles/read/189/how-to-create-a-file-and-generate-a-download-with-javascript-in-the-browser-without-a-server
        
        // Convert paramters to the correct form. See Unity WebGL Plugins page
        // for more information. It's not too important to realize why you need 
        // to do this, as long as you know THAT you need to.
        var strFilename = Pointer_stringify(filename);
        var strContent = Pointer_stringify(base64Data);

        var element = document.createElement('a'); // Create an <a></a> node.
        
        // Set the link destination as hard-coded file data.
        element.setAttribute('href', 'data:application/octet-stream;base64,' + encodeURIComponent(strContent));
        element.setAttribute('download', strFilename);
        
        
        element.style.display = 'none';     // Make sure it's not visible when added to the HTML body
        document.body.appendChild(element); // Activate it by adding it to the HTML body
        element.click();                    // Don't wait for the user to click it, activate it ourselves!
        document.body.removeChild(element); // Clean up our mess, now that the anchor's purpose is finished.
    },
}

Decode a Base64 string into binary data.

To decode a Base64 string back to binary data in Unity:

  • Convert the Base64 encoded string given to you from the browser into a byte array with Convert.FromBase64String.
  • And then pull out and convert the data from the byte array to the correct types.
    • Use BitConvert functions to convert certain indexes of the array to a typed value.
    • Use Buffer.BlockCopy to copy the byte array into an array of another type.
// C# MonoBehaviour sample

public class YourDataReceiver : MonoBehaviour
{
    ...
    
    public void FileUploadDirect(string base64Data)
    {
        // This is assuming the Javasript that gave us the string didn't
        // strip out the metadata. If it did, the subString isn't needed.
        int comma = base64Data.IndexOf(',');
        base64Data = base64Data.Substring(comma + 1);
    
        byte [] rb = System.Convert.FromBase64String(base64Data);
        this.LoadBinaryData("", rb);
    }
    
    ...
}
// JavaScript to upload to Unity WebGL app
let base64EncData = GetUploadedBinaryDataAsBase64String(); // Hand-wavey enough?

unityInstance.SendMessage(
    'GameObjectWith_YourDataReceiver', 
    'FileUploadDirect', 
    base64EncData);

JavaScript Data APIs

If you’re retrieving file data from JavaScript APIs, often it will provide you the Base64 string.

Given the cookie-cutter embedding of a Unity WebGL application:

// In the webpage's source file
<script>
      var unityInstance = UnityLoader.instantiate("unityContainer", "Build/FontViewer.json", {onProgress: UnityProgress});
</script>
      ...
      <div id="unityContainer" style="width: 960px; height: 600px"></div>
      ....

Get a Browser’s File Selection Dialog File to Base64

// Input field in the GUI
<input type="file" onchange="FUpload(this)">
...
<script>
function FUpload(ipt)
{
	if(ipt.value == "")
		return;
		
	let reader = new FileReader();
	reader.onloadend = function(evt) 
	{
		if (evt.target.readyState != 2)
			return;

		if (evt.target.error) {return; /*TODO: Error handling. */}

        // FileUploadDirect() is a Unity WebGL Plugin function, implemented in a 
		// MonoBehaviour attached to a GameObject named Managers.
		//
		// e.target.result is the base64 encoded string - but be aware it has 
		// Mime data prefixed to it that needs to be removed before it can before
		// decoded.	
		unityInstance.SendMessage("Managers", "FileUploadDirect", evt.target.result);
	}
	//Note the use of readAsDataURL as opposed to other readAs* variants.
	reader.readAsDataURL(ipt.files[0]); 
}
</script>

Note the use of FileReader.readAsDataURL(). This specific readAs* function is needed to make the returned value of evt.target.result a Base64 string.

To preview what evt.target.result would return to you, try the input field directly below. Be careful with large files as they can bog down the browser.

Get a Browser’s Drag-and-Drop File to Base64

We can make the Unity app a drag-and-drop area, that sends the Base64 data to the Unity app with the FileReader and the onloadend’s result, which evaluates to a Base64 string of the file’s binary contents.

// In the webpage's source file
<script>
dropArea = document.getElementById("unityContainer");

// All 4 significant drag functions must have preventDefault() called inside
// to override the browser's default drag-drop files behaviour.
//
// See https://www.digitalocean.com/community/tutorials/js-drag-and-drop-vanilla-js for more info.
dropArea.addEventListener("dragenter", 	Handle_DragEnter, false);
dropArea.addEventListener("dragleave", 	Handle_DragLeave, false);
dropArea.addEventListener("dragover", 	Handle_DragOver, false);
dropArea.addEventListener("drop", 		Handle_Drop, false);

function Handle_DragEnter (e){e.preventDefault();}
function Handle_DragLeave (e){e.preventDefault();}
function Handle_DragOver (e){e.preventDefault();}
function Handle_Drop (e) 
{
	e.preventDefault();
	loadFile(e.dataTransfer.files[0]); // Ignore multi-file drops for this sample.
}
function loadFile(file)
{
	let reader = new FileReader();
	reader.readAsDataURL(file);
	reader.onloadend = function(e)
	{
		// FileUploadDirect() is a Unity WebGL Plugin function, implemented in a 
		// MonoBehaviour attached to a GameObject named Managers.
		//
		// e.target.result is the base64 encoded string - but be aware it has 
		// Mime data prefixed to it that needs to be removed before it can before
		// decoded.
        unityInstance.SendMessage("Managers", "FileUploadDirect", e.target.result);
	}
}
</script>

While the results will be the same as uploading data through the input field – here’s another sample with drag-and-drop. Drop a file below to see its Base64 encoding. Don’t miss the hitbox when you drop the file or else the browser will change the page to open the file in this tab; and again, be careful with large files.

Drag & Drop area.

Removing the MIME Data

When looking at the Base64 data, you will notice MIME data prefixed to the front, for example:

data:application/octet-stream;base64,AAEAAAASAQA...

Given this example data, everything before AAEA is just metadata telling us what kind of file it is. When converting the data back into binary, this is not actual binary data and should be stripped out. This can be done from either the JavaScript side or the Unity app side by removing everything before-and-including the comma.

JavaScript Conversions

What if we’re in a situation where we have the binary data and want it as a Base64 string, but we don’t have access to a data stream object that will convert it for us?

The most efficient and robust way I’ve found is to pass it into a reader that will then create a data stream object. Although the downside to this is that it is asynchronous – although that could be an upside, depending on your use case.

The first task is to take our data and convert its underlying representations of bytes into a byte array. And then convert that byte array into a Base64 encoded string with the help of FileReader.

// Convert an array buffer to its binary form
// of a base64 string.
// https://stackoverflow.com/questions/9267899/arraybuffer-to-base64-encoded-string
function arrayBufferToBase64(buffer, callback)
{
	var blob = new Blob([buffer], { type: 'application/octet-binary' });
	var reader = new FileReader();

	reader.onload = function (evt)
	{
		var dataurl = evt.target.result;
		// Notice the substring on the ',' to strip out the MIME prefix
		callback(dataurl.substr(dataurl.indexOf(',') + 1));
	};
	reader.readAsDataURL(blob);
}

It’s either that or encode it ourselves in JavaScript, which doesn’t sound elegant or optimal.

Anything that does per-character conversion in pure JavaScript instead of providing a black-box to fall back into a lower-level implementation has a good chance of becoming unreliable (sluggish and inefficient) when these binary data sizes are large.

Things I’ve Seen And Wouldn’t Recommend

There are forum posts out there that suggest using the JavaScript function btoa or String.fromCharCode to inverse the mapping. I’d recommend against those.

btoa does a string conversion of the data, and then Base64 encodes that string.

And if you’re using String.fromCharCode, you risk running into the “holes in the inverse mapping” issue. This should be safe for encoding string data, but if that’s the case, why are you binary encoding a string as a string!? Just use the original string! This might be safe for Unicode characters, but at that point, you’re probably using 2 or more bytes to represent a byte in your binary representation. Not to mention you’re also looping through the data on a per-byte level in raw JavaScript.

– Stay strong, code on. William Leu

Explore more articles here.
Explore more articles on Unity WebGL here.