Go memory layout and struct alignment

Have you ever wondered if the ordering of the fields within a Go struct has any impact on memory consumption or application performances? Long story short: it might be the case, so let’s see why!

Some context

We would expect the following struct to have a compile time size of 28 bytes, but unsafe.Sizeof reports a size of 32 bytes… what’s happening?

type container struct {
	s     string   // 16 bytes
	f     float32  // 4 bytes
	ptrf *float32  // 8 bytes on x86-64
}

func main() {
	x := container{}
	fmt.Printf("Size of x: %v bytes\n", unsafe.Sizeof(x))
}

Output:
Size of x: 32 bytes

On modern processors, the way the compiler lays out basic datatypes in memory is constrained in order to make memory accesses faster. But why? In fact the CPU performs reads and writes to memory most efficiently when the data is self aligned, which means that the data’s memory address is a multiple of the data size.
In Go, the specification guarantees the size of numeric types and lists a bunch of alignment properties related to a variable of any type, structs and arrays. Taking numeric types as an example, byte sized types such as byte uint8 and int8 can start on any byte address, 2-bytes long types, such as int16, must start on an even address, 4-bytes types, such as int32, must start on an address divisible by 4, and 8-byte types, such as int64, must start on an address divisible by 8. When a data respect these alignment rules, the data is said to be self-aligned, and this makes access faster because it facilitates the CPU generating single-instruction to fetch data. The reason is that the x86-64 CPU has a bus of 64 bits (8 bytes) and it fetches data from main memory using 8-bytes aligned addresses. Meaning that to fetch a specific data, the CPU will fetch 8 bytes, starting to the lowest address before where the actual data resides, that is a multiple of 8. Let’s see this with an example what happens when data is not self-aligned:

Let’s presume byte 0 address is 8-bytes aligned and that the CPU wants to fetch the data contained in our *int variable starting at an odd address. Because this address is not self-aligned, the CPU would need to perform 2 fetch operations, each one of one word (8-bytes) in length:

  1. Load bytes 0-7
  2. Load bytes 8-15

Go structs

Moving on to Go, when laying out a struct we might think that the different fields of the struct will be allocated on contiguous memory addresses, but this is not always true. Let’s see why and how this applies to Go and its compiler via some examples. Let’s start with the example at the beginning of this post:

type container struct {
	s     string   // start:  0  len: 16
	f float32      // start: 16  len:  4
	// pad[4]      // start: 20  len:  4
	ptrf *float32  // start: 24  len:  8
}

func main() {
    x := container{}
    fmt.Printf("Size of x: %v\n", unsafe.Sizeof(x))
    fmt.Printf("Offset of x.ptrf: %v\n", unsafe.Offsetof(x.ptrf))
}

Output:
Size of x: 32 bytes
Offset of x.ptrf: 24

Here the compiler adds 4 bytes of padding to 8-byte align the address of the ptrf variable.

Let’s take another example to introduce another, more surprising, type of padding:

type container2 struct {
	s string // start:   0  len: 16
	b bool   // start:  16  len: 1
	// pad[7]
}

func main() {
	x := container2{}
	fmt.Printf("Size of b: %v bytes\n", unsafe.Sizeof(x))
	fmt.Printf("Align of b: %v\n", unsafe.Alignof(x))
}

Output:
Size of x: 24 bytes
Align of x: 8

We would expect the b variable to have a compile time size of 17 bytes, but the go runtime tells us it has a size of 24 bytes, which means the compiler added 7 bytes of padding, despite the fact that both x.s and x.b addresses are self-aligned with their respective type sizes! This is because the Go specification (linked above) says that

For a variable x of struct type: unsafe.Alignof(x) is the largest of all the values unsafe.Alignof(x.f) for each field f of x, but at least 1.

Which means that x type has an alignment of 8 (*) bytes. But the size of the fields is only 17, which is not a multiple of 8. So in order for x to be self aligned to its size, the compiler needs to add 7 bytes -> 16+1+7 = 24, which is a multiple of 8 and now all checks out!

Long story short, whenever our struct fields addresses are not self-aligned, the compiler will add a bunch of bytes of padding to ensure self alignment. This ensures optimal access for the CPU, but in turns this might produce sub-optimal memory usage as we add padding.

We are now able to understand why self-alignment makes for optimal memory access for the CPU:

To fetch the data contained in our *int variable, because the address of the data is now self-aligned, the CPU needs to perform only 1 fetch operation to load bytes 8-15.

A simple but sometimes effective technique to pack your struct and reduce padding (i.e. waste), is to order fields by size, in descending order (note that in the examples listed above, the results wouldn’t have been great). Of course this doesn’t always make sense as in most cases code readability trumps memory and performance optimisations, so another solution is to group fields logically, and within these groups apply the aforementioned technique. This will have the added benefit to improve cache locality.

The CPU cache

What does it mean to improve cache locality? It means increasing the chances that the data we need is available in the CPU cache for a much faster retrieval. Let’s see what happens: during the fetch phase, the CPU collects an instruction and prepares it for decoding. If the requested data is not already present in the CPU caches (L1/L2), it is collected from a memory location and then put in the caches, along with the neighbouring bytes, in the hope that those bytes will be accessed in the future, avoiding a memory access and thus improving performances. The chunks of memory fetched by the CPU and put in the cache are called cache lines, and on x86-64 architectures, a cache line is 64 bytes.

(*) struct type container2 has an alignment of 8 bytes because it’s the largest of all the values that make a string, as both the pointer to the data and the length are 8 bytes in compile time size, see see https://stackoverflow.com/questions/65878177/unsafe-sizeof-says-any-string-takes-16-bytes-but-how

References:

  • http://www.catb.org/esr/structure-packing/
  • https://go.dev/ref/spec#Size_and_alignment_guarantees
  • https://go101.org/article/memory-layout.html
  • https://pkg.go.dev/unsafe#pkg-overview
  • https://go.googlesource.com/go/+/refs/heads/dev.regabi/src/cmd/compile/internal-abi.md
  • https://stackoverflow.com/questions/65878177/unsafe-sizeof-says-any-string-takes-16-bytes-but-how
Written on April 22, 2024