Categories
C/C++ Coding Embedded microcontrollers Software

Binary – Bits, Bitwise Operators & Bitmasks

First we define bits, bit fields and bit masks, relating these to data structures, addressable memory storage, registers and binary protocol frames.

Bitwise logic is introduced with simple practical examples – bitwise operators and bit shifts – setting individual bits on or off, extracting bits for reading, manipulating bits or checking bit field values.

Bitwise Logic Gates

In Assembler, C and other programming languages bit manipulation is common especially when implementing low level hardware interfaces, embedded microcontroller based IOT devices, processing sensor, machine, network and peripheral data.

Graphics programming and image manipulation also utilises bitwise operations and bitmaps to perform animation (shifting and moving bits), compositing and pixel value modification.

Bits, Bytes & Bitmasks Explained

Bits are smallest unit of storage representing either a binary 1 or 0.

A byte is a group of binary digits or bits (typically eight) operated on as a unit.

Bit fields are a series or array of bits in adjacent memory.

Bit field values might represent a set of individual attributes (boolean on/off “flags” or status fields), a register value (storage address / instruction), or encapsulate binary encoded data.

AND bitmask

A bitmask (or mask) is an ordered set of bits of defined length used in bitwise logic to set, invert or manipulate another bit field.

Registers – Bit Arrays

Registers encapsulate a set of ordered bits in storage with a defined bit length typically expressed as power of base 2 (commonly 8 / 16 / 32 / 64 / 128 bit).

76543210
8 bit register where least significant bit (LSB) starts with (2º = 1)

An 8 bit register can hold unsigned (positive) numbers from 0 to 255 or signed (includes negative) values from -128 to +127 (where bit 7 is a sign bit).

Registers are used to address memory locations for computations within central processing unit, configuration parameters, IO from data storage, ports, or peripherals.

Bits in a Byte – Memory Address

Bitwise shift operators, along with logical AND can be used to access individual bits within a byte of memory for any variable in C.

To read value of bit at index 4 of an 8 bit length variable x

int x = 16; // 10000
printf("%d\n", (x >> 4) & 0x01); // Prints 1

Network Protocols – Binary Bit Sequences

Bit sequences are used in network protocols where packetised messages are arranged into frames containing fields with defined offset and length.

A frame refers to the entire data packet which is being sent/received during a communication. Each protocol defines a specification of its own frame format.

An RS232 serial protocol frame defines a bit sequence –

Frame:StartDataParityStop
Size (bits):15-911-2
Bit sequence format of a USART serial protocol frame

Bitwise operators and shifts can be used to efficiently read, set or modify fields within a network protocol packet.

Bitwise Operators

Bitwise operations allow setting bit sequence values in a single operation and are more efficient than loops or maintaining individual bits.

SymbolOperator
&bitwise AND
|bitwise inclusive OR
^bitwise XOR (exclusive OR)
<<left shift
>>right shift
~bitwise NOT (one’s complement) (unary)
Bitwise and shift operators

Logical operators AND, OR, XOR (Exclusive OR) and NOT (Negation) implement bitwise manipulation, incurring a minimal number of processing instructions.

Comparing bit by bit –

  • OR sets a bit if one or both operators are true: 1110 OR 0000 = 1110
  • AND sets a bit if both operators are true: 1010 AND 1101 = 1000
  • AND with zero-check tests if any bit is set: 1010 AND 0010 = 0010 ≠ 0
  • XOR sets a bit if only one operator is true: 1010 XOR 0100 = 1110
  • NOT inverts all bits: NOT 1011 = 0100

Bitwise operators work on integer and character data types and do not modify value of their arguments so assignment (=, +=, -=, |=, &=) is typically used for example x |= (y & z).

Arithmetic Bit Shifts

Left Arithmetic Shift

Bit or Arithmetic shift operators << >> treat a value as a series of individual bits rather than a numerical quantity, shifting (or moving) bit positions left or right.

These operations are useful to move a bit or set of bits to a specific positional index.

In a left shift operation (<<) bits are moved by one position to the left and last digit is zero filled. This is equivalent to multiplication of a signed integer by a power of 2.

01101011
8 bit Binary encoding of decimal 107

Left shifting 2 bits ( << 2) results in

10101100
8 bit Binary encoding of decimal 172

A right shift operation (>>) moves bits right by a specified number of positions, dividing number by 2.

In more general form we can say –

// x multiplied by 2ⁿ
x << n
// x divided by 2ⁿ 
x >> N

Notes –

  • Bits shifted from end are lost due to overflow, they do not wrap around.
  • Right bitshift on signed types – gcc promises to always give the sane behavior (sign-bit-extension) but ISO C allows the implementation to zero-fill the upper bits.

Defining bit masks in C

Bitmasks are used in bit manipulation and combined with a logical operator (AND, OR XOR) define a pattern for bits to keep or discard.

In c++14 which supports binary literals bit masks are defined –

const uint8_t mask0 = 0b00000001 ; // represents bit 0
const uint8_t mask1 = 0b00000010 ; // represents bit 1
const uint8_t mask2 = 0b00000100 ; // represents bit 2
...  

Using Hexadecimal

const uint8_t mask0 = 0x01 ; // bit 0  0000 0001
const uint8_t mask1 = 0x02 ; // bit 1  0000 0010
const uint8_t mask2 = 0x03 ; // bit 2  0000 0100
...  

Or with bit shift operator

const uint8_t mask0 = 1 << 0 ; // 0000 0001
const uint8_t mask1 = 1 << 1 ; // 0000 0010
const uint8_t mask2 = 1 << 2 ; // 0000 0100
...

Set, Clear, Modify, Toggle & Check Bits

Set a bit
Set the nth bit ( zero up to n-1 ) of number

number |= 1UL << n

Set bit 0 of i to one

i |= 1 << 0;

// Example
000 i
001 1 << 0;
001 i |= 1 << 0;

  • the << operator is left “bit shift” operator which moves all bits to the left n times.
  • 1UL species numeric literal 1 with type Unsigned Long

Clear a bit
Set nth bit of number to zero

number &= ~(1UL << n);

Set bit 1 of i to zero

i &= ~(1UL << 1);

// Example
010 i
000 ~(1UL << 1)
000 i &= ~(1UL << 1);
  • ~ negates value of bit 1 from 1 to 0
  • AND evaulates false as both operands are not equal

Toggle (flip) a bit

Toggle nth bit of number

number ^= 1UL << n;

Toggle (flip) bit 0 of i

i ^= 1 << 0;

// Example
001 i
001 1 << 0
000 i ^= 1 << 0
  • ^= represents XOR exclusive OR with assignment, output is true when operand bits are different

Check a bit is set

True if nth bit of number equals 1

bit = (number >> n) & 1U;

Check bit 7 of i assign 1 or 0 to bit

int bit = (i >> 7) & 1U;

Modify – Changing a bit to x (1 or 0)

// 2s complement system with negation behaviour
number ^= (-x ^ number) & (1UL << n); 
// portable
number = (number & ~(1UL << n)) | (x << n);

Set bit 7 of i to x

i ^= (-x ^ i) & (1UL << 7);
i = (i & ~(1UL << 7)) | (x << 7);

Notes –

  • range / bounds checking is not applied, out of bounds shift index result in undefined behaviour
  • Use 1ULL if number is wider than unsigned long
  • endianess (position of most / least significant bit) and signing varies between platforms and compilers

Bitwise Functions as C Pre-Processor Macros

To minimise code duplication bit operator functions can be defined in a header file as pre-processor macros

/* a=number, b=bit index 0-n */
#define BIT_SET(a,b) ((a) |= (1UL<<(b)))
#define BIT_CLEAR(a,b) ((a) &amp;= ~(1UL<<(b)))
#define BIT_FLIP(a,b) ((a) ^= (1UL<<(b)))
#define BIT_CHECK(a,b) (!!((a) &amp; (1UL<<(b))))

Print Bits in C as Left Padded Binary String

In C printf() does not have a format specifier to print a string representation of binary, but we can write a function to achieve this –

const unsigned bits = 8;

// print integer value as a left padded binary string
void print_bits(unsigned value)
{
    unsigned mask = 1 << (bits-1);

    while (mask)
    {
        printf("%d", (mask &amp; value) != 0);
        mask >>= 1;
    }

    printf("\n");
}

int main()
{
  int i = 0x145;
  print_bits(i);
}

Result:
01000101

References and Further Reading:

[1] Methods for Bit Manipulation & Discussion:
https://stackoverflow.com/questions/47981/how-do-you-set-clear-and-toggle-a-single-bit/263738#263738

[2] Theory and examples of Bitwise Operation:
https://en.wikipedia.org/wiki/Bitwise_operation

[3] Bit Twiddling Hacks – Advanced Bitwise Algorithms http://graphics.stanford.edu/~seander/bithacks.html

Categories
arduino C/C++ esp8266 Internet of Things Python Software WebSockets

WebSocket Binary – ESP8266 to Web Browser

Binary wire protocols are long established for embedded machine to machine (M2M) communication, network applications and wireless radio data transmission.

Internet of Things (IOT) devices, real time sensors, robotics, smart home of industrial machine control data also demand efficient, low latency & lightweight data communications.

WebSockets ( RFC6455 ) protocol brings native support for binary framed messaging to web browser clients, offering a compact lightweight format for fast and efficient endpoint messaging.

Why use binary format data messaging?

Compared to serialisation of more complex text based wire formats, binary is lightweight and requires minimal storage / bandwidth and processing.

Taking an example key/value command data message:

// JSON encoding
{"cmd":101,"value":180}
23 * 2 = 46 bytes

// CSV plain text encoding
101,180\n
9 * 2 = 18 bytes 

// Binary
101 180
int (4 bytes) + int(4 bytes)
4+4 = 8 bytes

In case of high performance applications supporting a large number of clients or very high frequency of data exchange, minimising data size, bandwidth and processing becomes an important priority.

Binary wire protocols are long established for embedded M2M messaging

Taking as a simple example an embedded ESP8266 WiFi device, message gateway and web browser client, data serialisation and bidirectional binary framed WebSocket data exchange are demonstrated.

ESP8266 Byte Array Serialisation

Internally data is represented in embedded microcontrollers as ones and zeros, sequences of bits arranged in addressable memory.

Higher level programming language abstraction provides human readable textual labels and in case of C/C++ associated type information.

Lets define a mixed type data structure that could be some kind of sensor or message data payload –

    // define mixed type data struct
    struct Data
    {
        int id;
        float v1;
        float v2;
        unsigned long v3;
        char v4[20];
    };

    struct Data data;

    // populate data values
    data.id = 67;
    data.v1 = 3.14157;
    data.v2 = -7.123;

    unsigned long ts = millis();
    data.v3 = ts;

    char c[20] = "N NE E SE S SW W NW";
    strncpy(data.v4, c, 20);

To access underlying bytes, a pointer to data structure address is created –

    uint8_t * bytePtr = (uint8_t*) &data;    
    webSocket.sendBIN(bytePtr, sizeof(data));

Data pointer and length are passed to WebSocket send method “webSocket.sendBIN()”, byte range is read, packaged (framed) according to protocol specification and written to TCP/IP network socket.

Hexidecimal and Binary text representation of in memory data structure can also be displayed –

void printBytes(const void *object, size_t size)
{
    const uint8_t * byte;
    for ( byte = (uint8_t *) object; size--; ++byte )
    {
        Serial.print(*byte, HEX);
        Serial.print("\t");
        Serial.println(*byte, BIN);
    }
    Serial.println('\n');
}

Python WebSocket Server

A Python3 middleware hosts WebSocket server and acts as a message relay gateway.

Binary WebSocket messages can be decoded in Python, the struct module performs conversions between Python data types and C structs –

async def wsApi(websocket, path):
    try:
        async for message in websocket:
            print('User-Agent: '+ websocket.request_headers['User-Agent'])
            print('Sec-WebSocket-Key: '+websocket.request_headers['Sec-WebSocket-Key'])
            print('MessageType: '+str(type(message)))
            print(message);
            print('Hex: '+message.hex());

            if isinstance(message, (bytes, bytearray)):

                i = message[:4];
                print(i);
                tuple_of_data = struct.unpack("i", i)
                print(tuple_of_data)

                tuple_of_data = struct.unpack_from("f", message, 4)
                print(tuple_of_data)

                tuple_of_data = struct.unpack_from("f", message, 8)
                print(tuple_of_data)

                tuple_of_data = struct.unpack_from("i", message, 12)
                print(tuple_of_data)

                tuple_of_data = struct.unpack_from("20s", message, 16)
                print(tuple_of_data[0])

                ## forward message
                await asyncio.wait([user.send(message) for user in USERS])

To index into byte array and read a number of bytes according to data type being unpacked Python’s array slice method “i = message[:4]” can be used where [<from>:<to>] specifies start/end positions.

Method struct.unpack_from() is another approach, taking as parameters a format character specifying data type (“i” – integer, “f” – float), data buffer and an index (in bytes) to read from.

Here is decoded binary message output including some WebSocket headers –

User-Agent: arduino-WebSocket-Client
Sec-WebSocket-Key: zoJ0aR/5XunSvEKKcUkWfQ==
MessageType: <class 'bytes'>
b'C\x00\x00\x00|\x0fI@\x9e\xef\xe3\xc0\xb9\x17\x00\x00N NE E SE S SW W NW\x00'
Hex: 430000007c0f49409eefe3c0b91700004e204e45204520534520532053572057204e5700
b'C\x00\x00\x00'
(67,)
(3.1415700912475586,)
(-7.123000144958496,)
(6073,)
b'N NE E SE S SW W NW\x00'

Web Browser – Binary Encode/Decode in JavaScript

In web browser, JavaScript primitives Blob, ArrayBuffer and TypedArray perform a similar conversion.

Firstly, received WebSocket messages (event object) can be debugged to console –

 websocket.onmessage = function (event) {
    console.log(event);

Binary framed data payload is reported as type “Blob” (raw data) of length 36 bytes –

Chrome Developer Tools console log for WebSocket Binary message receieve event

To de-serialise message, raw data Blob is converted asynchronously using FileReader API to ArrayBuffer, a generic fixed length binary data buffer –

    if (event.data instanceof Blob)  // Binary Frame
    {
      // convert Blob to ArrayBuffer
      var arrayPromise = new Promise(function(resolve) {
          var reader = new FileReader();

          reader.onloadend = function() {
              resolve(reader.result);
          };

          reader.readAsArrayBuffer(event.data);
      });

When promise is fulfilled, ArrayBuffer can be read using typed views (Uint32Array, Uint32Array) for integer (including long) and float types, TextDecoder API is used to decode character array –

arrayPromise.then(function(buffer) {

          // Decoding Binary Packed Data

          // int (4 bytes)
          var arrInt = new Uint32Array(buffer);
          var id = arrInt[0];
          console.log("id:"+id);

          // 2x float (4 bytes)
          var arrFloat = new Uint32Array(buffer,4);
          var v1 = arrFloat[0];
          var v2 = arrFloat[1];
          console.log("v1: "+v1);
          console.log("v2: "+v2);

          // long (4 bytes)
          var v3 = arrInt[3];
          console.log("v3:"+v3);

          // character data (20 bytes)
          var uint8Array = new Uint8Array(buffer,16);
          var string = new TextDecoder("utf-8").decode(uint8Array);
          console.log(string);
      });

JavaScript Binary Data Encoding

Binary data can also be encoded from native JavaScript. TypedArrays created for each data type – integer, float, long and character array are populated and packed into an ArrayBuffer suitable for use as WebSocket data payload –

console.log("Binary Encode example");

// Binary Encode example
var buffer = new ArrayBuffer(36)
var arrInt =   new Uint32Array(buffer, 0, 1);
arrInt[0] = 67;
var arrFloat = new Float32Array(buffer, 4, 2);
arrFloat[0] = 3.14157;
arrFloat[1] = -7.123;

var arrInt2 =   new Uint32Array(buffer, 12, 1);
arrInt2[0] = Date.now();

var uint8Array = new Uint8Array(buffer,16);
var charBuffer = new TextEncoder("utf-8").encode("N NE E SE S SW W NW");

for(var i = 0; i<charBuffer.length; i++)
{
  uint8Array[i] = charBuffer[i];
}

// send binary data
websocket.send(buffer);

At message gateway, logs demonstrate parity between data packed by embedded device and those sent from web browser client –

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Sec-WebSocket-Key: 1TD9Zp71cMTivUbj+QSx5w==
MessageType: <class 'bytes'>
b'C\x00\x00\x00|\x0fI@\x9e\xef\xe3\xc0\x07\x1c\xff\x91N NE E SE S SW W NW\x00'
Hex: 430000007c0f49409eefe3c0071cff914e204e45204520534520532053572057204e5700
b'C\x00\x00\x00'
(67,)
(3.1415700912475586,)
(-7.123000144958496,)
(-1845552121,)
b'N NE E SE S SW W NW\x00'

Limitations / Drawbacks

Compared to UTF-8 text formats (XML, JSON) packed binary data has significant disadvantages –

  • legibility – text based key/value formats are easy to read, manipulate and maintain
  • fixed frame boundaries – using positional byte sequence indexes means even small changes to message structure, size or field position require updates to consumer client code
  • endianess / alignment / padding must be maintained consistently, compiler and platform implementation differences may occur

Security

WebSockets Secure (WSS) offers transport layer security (TLS) to encrypt data streams. An authentication and authorisation strategy (challenge/response password, token or certificate based) for client identification should also be deployed. Cryptographic message digest signing or encryption might also be used as extra protection for critical data.