In-depth explanation of CPU endianness

Big-Endian

In big-endian, the most significant byte of data is stored at the lowest memory address, and the least significant byte is stored at the highest memory address. For example, the hexadecimal number 0x12345678 is stored in memory in the order of 12 34 56 78 in big-endian. This storage method aligns with human intuition and is suitable for scenarios where humans need to directly read and process data.<sup>12</sup>

Little-Endian

In contrast to big-endian, little-endian stores the least significant byte of data at the lowest memory address and the most significant byte at the highest memory address. For example, the hexadecimal number 0x12345678 would be stored in memory in the order 78 56 34 12 in little-endian. Little-endian is more common in computer systems because it makes data reading and writing more efficient.

Advantages and disadvantages of big-endian and little-endian patterns

Advantages of big-endian mode:

Determining whether it is positive or negative is very easy because the high-order byte represents the sign bit.

Disadvantages of big-endian mode:

Reading data in the order that is opposite to human reading habits may lead to lower efficiency.

Advantages of little-endian mode:

The data reading order is consistent with human reading habits, resulting in high reading and writing efficiency.

Disadvantages of little-endian mode:

Determining whether something is positive or negative requires more computational steps.

Big-endian and small-endian patterns in different architectures

Different architectures have different default modes:

x86 architecture: typically uses little-endian mode.

Network protocols: Common network byte order uses big-endian mode (e.g., TCP/IP protocol).

Why is there a distinction between big-endian and small-endian modes?

Because in computer systems, we use bytes as the unit, each address unit corresponds to one byte, and one byte is 8 bits.

However, in C, besides the 8-bit `char`, there are also the 16-bit `short` and 32-bit `int` types. Furthermore, for processors with more than 8 bits, such as 16-bit or 32-bit processors, since the register width is greater than one byte, there is inevitably the issue of how to arrange multiple bytes. This leads to the differences between big-endian and little-endian memory models.

For example, if a 16-bit short type x has a memory address of 0x0010 and a value of 0x1122, then 0x11 is the high byte and 0x22 is the low byte.

In big-endian mode, 0x11 is placed in the lower address, i.e., 0x0010, and 0x22 is placed in the higher address, i.e., 0x0011. Little-endian mode is the opposite.

I. Endianness

1. Big-Endian

Big-endian, also known as big-byte order, stores the most significant byte (byte 0x12345678) at the lowest memory address and the least significant byte (byte 0x56) at the highest memory address. This is similar to how humans write numbers from left to right, with the most significant byte first and the least significant byte last. For example, the hexadecimal number 0x12345678 consists of four bytes: 0x12, 0x34, 0x56, and 0x78. In big-endian, the storage order is 0x12 0x34 0x56 0x78. From a memory address perspective, the most significant byte 0x12 is stored at the lowest address, and 0x34, 0x56, and 0x78 are stored sequentially as the address increases. This storage method aligns with human intuition and has advantages in scenarios where humans need to directly read and process data.

To better understand big-endian, we can imagine a bookshelf, where each shelf represents a memory address, and the books represent bytes of data. When we arrange the books in big-endian format, the most important information (high-order bytes) is placed at the bottom (lowest memory address), and less important information (low-order bytes) is placed sequentially as the shelf increases. This way, when we read the books starting from the bottom, we can retrieve the data in the order we are accustomed to.

Endianness refers to the order in which bytes are stored in a computer system when storing multi-byte data.

Computer system memory is divided into bytes, with each address unit corresponding to one byte. A byte is 8 bits in size and can store an 8-bit binary number, such as 10101010. However, in C, besides the 8-bit `char` type, there are also 16-bit `short` and 32-bit `long` types, depending on the specific compiler. Furthermore, for processors with more than 8 bits, such as 16-bit or 32-bit processors, since the register width is greater than one byte, there is inevitably the problem of how to arrange multiple bytes into memory, leading to big-endian and little-endian memory models.

2. Little-Endian

In contrast to big-endian, little-endian stores the least significant byte (LSB) of data at the lowest memory address and the most significant byte at the highest memory address. Using 0x12345678 as an example, in little-endian, the storage order is 0x78 0x56 0x34 0x12. This means that the LSB 0x78 is stored at the lowest address, while the most significant byte 0x12 is stored at the highest address. Little-endian is widely used in common processor architectures such as x86 and ARM.

Let's continue using the bookshelf example to understand little-endian. In little-endian, less important information (low-order bytes) is placed at the bottom layer of the bookshelf (low address), while important information (high-order bytes) is placed at the top layer (high address). Although this storage method differs from human writing habits, it has unique advantages in computer processing. For example, when performing operations such as addition and subtraction, little-endian can more easily handle low-order bytes, improving computational efficiency.

II. Endianness Issues in Data Transmission

When a little-endian machine needs to send data to the network, it must first convert the data from its local little-endian mode to big-endian mode. This is because network protocols stipulate that data must be transmitted in big-endian mode during network transmission; only in this way can the receiver correctly parse the data. For example, if a computer using an x86 architecture (little-endian mode) wants to send a 32-bit integer 0x12345678 to another computer, it needs to convert this data to big-endian mode (0x12 0x34 0x56 0x78) before sending it.

When receiving data, little-endian machines need to convert the received big-endian data back to little-endian so that it can be processed correctly on the local machine. For example, when this x86 computer receives a 32-bit integer from the network, it will first convert the data from big-endian to little-endian before proceeding with further processing. This conversion process is like a translation job, ensuring that data can be correctly communicated between different "language environments" (endian modes).

The primary reason network protocols enforce big-endian byte order is to ensure data consistency and compatibility. Different computers may use different endianness modes, and without a unified standard, data can become corrupted during transmission. For example, data sent by a little-endian machine might be incorrectly parsed on a big-endian machine, leading to erroneous data processing. By uniformly adopting big-endian byte order, network protocols build a bridge for communication between computers with different endianness modes, enabling accurate data transmission and sharing across the network.

Endianness conversion

When processing data, especially in network communication and file reading/writing, it may be necessary to convert between big-endian and little-endian. Below are some common methods for big-endian conversion, including using standard library functions and manual implementation.

Use standard library functions

Many C standard libraries provide functions for converting network byte order, which can be used to perform endianness conversion. Here are some commonly used functions:

htonl(): Converts host byte order to network byte order (32-bit integer).

htons(): Converts host byte order to network byte order (16-bit integer).

ntohl(): Converts network byte order to host byte order (32-bit integer).

ntohs(): Converts network byte order to host byte order (16-bit integer).

In-depth explanation of CPU endianness

Read next

CATDOLL 136CM Tami (Customer Photos)

CATDOLL 139CM Luisa (TPE Body with Soft Silicone Head)

Discussion on DCS Field Reliability Applications

CATDOLL Cici Soft Silicone Head