Two's Complement And Weird Bit Widths

In a couple of previous audio articles, I’ve covered how PCM data is stored as two’s complement integers while glossing over what two’s complement is. This was done to avoid going off-topic. Well, now we’re going to get into that subject and get hands-on with the topic.

Before going into the details, an explanation of what two’s complement is in a nutshell: it’s the most popular way computers represent negative integers on modern computers. If you’re not sure if your processor uses two’s complement or not, then it does.
A large prerequisite for this article and the code samples is understanding bitwise operations – specifically the Shifting operators, the AND operator, and the OR operator. These should not be confused with boolean data type operators – which are similar but serve a different purpose. And for simplicity, we’re going to overlook the topic of endianness (and use little-endian).

Table of Contents

How Computers Represent Integers
Negative Numbers
Bit Masks
Signed Magnitude Representation
Two’s Complement
How To Use Two’s Complement
Weird Widths In Files
- Two’s Complement Nibble
  - In The Wild
  - Demo
- Two’s Complement For 24 Bit Integers
  - In The Wild
  - Demo

How Computers Represent Integers

Let’s start with the basics of how integers are represented to get the ball rolling, and we’ll segue from there.

Computer integers are stored as a collection of bits and represent a base 2 number system (binary). Each bit is either a 0 or a 1, and each additional significant bit added represents an added power-of-two contribution (1, 2, 4, 8, 16, etc.). This is analogous to the base 10 system (decimal) we’re familiar with, where each digit can be 0 – 9, and each additional significant digit added represents an added power-of-10 contribution (1, 10, 100, 1000, 10000, etc.).

Two diagrams showing the similarities and differences between base 10 and base 2 number systems. The example values used are 935 for decimal, and 6 for binary.

So in the same way the number 935 is constructed as 9 * 100 + 3 * 10 + 5 * 1, the binary value 110 can be constructed as 1 * 4 + 1 * 2 + 0 * 1 (giving us a decimal value of 6). And the largest number we can represent in binary is limited by how many bits we use to represent a number.

Negative Numbers

But, this only gives us a system for representing positive numbers. What about negative numbers? This is solved by making the highest bit, also called the most significant bit (MSB), the sign bit:

The highest valued bit is reinterpreted as the sign (i.e., if the value is positive or negative).
The rest of the lower bits represent the magnitude (i.e., absolute value).

When representing negative integers, the MSB is used to represent the sign of the number. If it is set to 1, the value is negative.

If the sign bit is 0, it’s a positive number. If the sign bit is 1, it’s a negative value.

Bit Masks

We need two bitmasks, one to extract the sign bit and one to extract the magnitude bits. To calculate this, we perform some bit shifting. Or we can use known integer values where the bits are set properly (see the “Mask Value” entries in the tables below for examples). The formulas are listed below, with an example of 8 bits (a byte); for other widths, change the Bits value to match your integer’s bit width.

Also, note the “-1” trick being used for the magnitude bitmask to turn on the bits below the sign bit. If we have a bit pattern with only one bit set and subtract one, we get a bit pattern back where all the bits below it are turned on.
The decimal analogy to this would be how a number that’s a one followed by only zeros, when subtracted by one, returns back all nines for every previous digit before the 1 (e.g., 10000 – 1 = 9999, 1000 – 1 = 999, 100 – 1 = 99, etc.).

Signed Magnitude Representation

The simplest (conceptual) way to represent negative integers is to naively implement what’s explained above: where the MSB is the sign bit, and the rest is interpreted as the integer’s absolute value (a.k.a., magnitude). This is called signed magnitude representation (SMR); it’s a simple concept but actually uncommon in practice. Instead, what we often see everywhere in our Intel/ARM/Arduino inescapable world is the use of two’s complement.

Two’s Complement

Two’s complement is a type of bit representation similar to SMR but is more elaborate for representing negative numbers. If the sign bit is off (for positive numbers), the bit representation looks the same as SMR. But, if the sign bit is on (for negative numbers), the magnitude bits are inverted, that value is negated, and an additional 1 is subtracted to get the final value.

It might sound like a convoluted scheme, but there are good justifications for doing this.
The biggest reasons are because :

The circuitry logic for addition and subtraction are the same between unsigned integers and two’s complement integers. This is not the case for SMR; branching logic would be required.
You don’t waste a value representation for a redundant negative zero.

See this Stack Overflow discussion to read more on the topic.

While it’s mentioned above that addition and subtraction (and some other operations) can use the same circuitry, this is not the case for every type of operation.

As previously mentioned, there is also a one’s complement. It’s the same as two’s complement without the subtraction by 1. That’s all that will be said about that.

How To Use Two’s Complement

I’m sure I don’t have to say this, but I’ll say it anyway – the bit representation you use is not an option you choose or a switch you toggle on. The fact that your computer uses two’s complement is hard-coded into the circuitry for your processor’s different operations (i.e., add, subtract, multiply, divide, etc.).

Your processor has registers of specific bit-widths that it operates data on, and it has built-in operators to handle two’s complement operations for 1-byte (8 bit), 2-byte (16 bit), 4-byte (32 bit), and 8-byte (64 bit) integers. This is why programming languages have signed and unsigned types for integer data types, so the compiler knows what nuanced operators and functions to use to process and evaluate the bits: those operators meant for unsigned bits or those meant for two’s complement signed bits.

Weird Widths In Files

But, what happens when you have file formats that have two’s complement integers with a bit width that your processor and/or programming language don’t support? You will need to reinterpret their bits into a bit width and pattern that’s usable.

It’s uncommon to experience this issue, but things can get hairy with file formats now and then. The rest of this article is going to cover two examples of this.

Two’s Complement Nibble

The first bit-width I want to cover is with nibbles. Nibbles are numbers represented with 4 bits. Nibbles are tiny integers; they’re not even a byte – they’re half a byte.
If you’ve seen hex values for a byte with 0x followed by two alphanumerics (e.g., 0xFF), each alphanumeric is a nibble.

A nibble is so small we actually don’t have a way to effectively hold it. We only have the ability to hold things as small as 8 bits (a byte). So when we’re holding the data, it’s held in a byte, and we mask-out and ignore the other half of the byte that we’re not using. If the nibble is on the byte’s more-significant (left) side, we’ll also have to shift it to the right so that the nibble’s LSB matches the byte’s LSB.

When representing nibbles, a byte is the smallest memory width our processors work with; so we store it in a byte and ignore the other 4 bits. If the nibble is on the significant side, bit shift it right 4 bits to place it on the less-significant side.

Below is a code sample to convert the first four bits of a byte, containing a two’s complement nibble, and turn them into a signed int data type that your processor can work with.

public static int NibbleToFinetune(byte nib)
{
    // Get the sign bit
    int nibSign = nib & (1 << 3);
    
    if (nibSign != 0) // If sign bit is set, return a negative number.
    {
        // Negative number. Invert the magnitude bit by inverting
        // the entire nib parameter and masking to only keep the 
        // magnitude portion. And we can't forget about the minus 1.
        return = -(~nib & ((1 << 3) - 1)) - 1;
    }
    else
    {
        // Positive value. Just return back the magnitude region.
        // We could just return nib as is without masking if we 
        // were sure the rest of the bits (4-7) were zeroed out.
        return = (nib & ((1 << 3) - 1));
    }
}

In The Wild

One place I’ve seen two’s complement nibbles is for values in the mod tracker format. For context, mod files are old-timey music files originally designed for the Amiga. This is a format created when computers were extremely limited on RAM and disk space – while we have gigs of ram today and terabytes of disk today, having even a meg of RAM in those times was a lot. So many tricks were employed to save space (which make old-timey formats a pseudo-nightmare to parse).
There’s are communities where people still write new tracker songs, including mod files – check out some songs here.

In mod files, parameters for note effects are stored in 8 bits – and depending on the effect, the 8 bits could either be a byte parameter or two separate nibble parameters.

Demo

Below is a demo for two’s complement nibbles.

Two’s Complement Nibble
				=
Sign	Mag4	Mag2	Mag1

Two’s Complement For 24 Bit Integers

I’ve also seen 24-bit integers in audio formats.

To work with these 3-byte integers, we’ll need to convert them into a larger int data type that our processor works with. Below is a code sample to take 3 bytes of a 24-bit two’s complement integer and turn them into a signed 32-bit integer.

public static int ThreeBytesToTwoComplInt(byte [] rb)
{
    byte b0 = rb[i + 0];
    byte b1 = rb[i + 1];
    byte b2 = rb[i + 2];

    const int hiBit = 1 << 23;          // Sign bit mask
    const int magBits = hiBit - 1;      // Magnitude bits mask
    int v = (b0 << 0) | (b1 << 8) | (b2 << 16); // Combined bits

    if((hiBit & v) != 0)
        return = -(~v & magBits) - 1);
    else
	    return = v;
}

In The Wild

They can be used in large arrays to store the PCM of audio.

The Audacity “Export As WAV” dialog. The dialog has a pulldown for all the different WAV PCM formats, including 24 bit integers.

Demo

Here’s another demo, but for 24-bit integers:

Two’s Complement 24bit Integer (Little Endian)
Byte 2
								=
Sign	2²²	2²¹	2²⁰	2¹⁹	2¹⁸	2¹⁷	2¹⁶	=
Byte 1
								=
32768	16384	8192	4096	2048	1024	512	256	=
Byte 0
								=
128	64	32	16	8	4	2	1	=
24 Bit Value :

– Stay strong, code on. William Leu