kilabit.info
Simple | Small | Stable | Secure

Re-learning slice

Shulhan
15 March 2020

This article show how the slice works in Go language in practice, by using code examples which you can try by yourself.

The original source of this article is the following slide that I submit to GopherCon Singapore 2019,

Overview

Most of the time we forgot or did not know how actually the slices works. Most of the time we just assume that slices is dynamic array. By relearning how slices works, we can possible avoid slice’s “gotcha” or take advantage of it. In this article we will review back how the slices works and maybe by knowing how it work we can write a better program.

The content of this article is not new, if you already read and understand Effective Go, The Go Programming Language Specification, and The Go Blog - Go Slices: usage and internals, most of the topics in here already discussed there. This article try to emphasis on code examples, learning by doing, step by step.

Back to the basic: What is an array?

An array is sequence of value of single type with fixed size.

There are two elements needed to create an array: size and type. The size is defined inside the square bracket and the type is defined after the closing bracket.

ArrayType   = "[" ArrayLength "]" ElementType .
ArrayLength = Expression .
ElementType = Type .

Array size is part of its type

One of the first properties of array is the size of an array is part of its type.

x := [5]int{1, 2, 3}
y := [5]int{3, 2, 1}
z := [5]int{1, 2, 3}

fmt.Printf("x == y: %v\n", x == y) // false
fmt.Printf("x == z: %v\n", x == z) // true

In the above snippet, x, y, and z are array with size 5 and int as its type.

Array x have the same size with y but does not contains the same order of value, so if we compare them it will return false.

Array x have the same size with z and contains the same value in the same order, so if we compare them it will return true.

Let say that we have another array a with different size from x,

a := [4]int{1, 2, 3}

If we try to compare them

fmt.Printf("x == a: %v\n", x == a)

the compiler will thrown an error,

[...] invalid operation: x == a (mismatched types [5]int and [4]int)

Out of curiosity, what would happen if we access address of array out of its range?

fmt.Printf("out of bounds> &in[5]:%p\n", &in[5])

The compiler will thrown an error at compile time,

./main_test.go:14:66: invalid array index 5 (out of bounds for 5-element array)

Arrays elements are zeroed

Once the array is created, all of its elements are set to its zero value.

in := [5]int{10, 20, 30}
fmt.Printf("contents > in:%v\n", in)
// Output:
// contents > in:[10 20 30 0 0]

In the above snippet, in is an array of int with size set to 5. The values for the first three elements at index 0, 1, 2 are set to 10, 20, 30 and the rest of values will be set to 0 by the compiler.

Array are values

Assigning or passing an array to another array will copy all of its values.

To understand this let see the address of instance of array and its first value.

in := [5]int{1, 2, 3}
fmt.Printf("&in:%p &in[0]:%p &in[1]:%p\n", &in, &in[0], &in[1])

// Output:
// &in:0xc0000122a0 &in[0]:0xc0000122a0 &in[1]:0xc0000122a8

(Note: The %p format print the address of value in memory, and it will be different on each system.)

The address of array in can be accessed using &in, and we got 0xc0000122a0. The address of first value can be accessed using &in[0], and we got 0xc0000122a0. This means the address of instance of array and address of its first element is equal.

Now, what would happen if we pass an array to function?

func passArray(y [5]int) {
	fmt.Printf("&y:%p\n", &y) // &y:0x45e020
	y[0] = 90
	fmt.Printf("y: %v\n", y)  // y: [90 20 30 0 0]
}

func main() {
	x := [5]int{10, 20, 30}

	fmt.Printf("&x:%p\n", &x) // &x:0x45e000

	passArray(x)

	fmt.Printf("x: %v\n", x) // x: [10 20 30 0 0]
}

Array x is created with address is 0xc00007a0f0. When we pass x to passArray() function, x values is copied to y. Array y is in different address 0xc00007a120 but have the same size and values as x. We can test this by changing the first value of y to 9 and print the x after call to passArray, the values in x does not affected by assignment in passArray() function.

So, x and y are two different arrays with the same size and values (on initial pass).

If we want passArray() function to be able to change the value that it received in y and the changes affected in x, we can pass x by using address and receive them by using pointer in y.

func passArray(y *[5]int) {
	fmt.Printf("y:%p\n", y) // &y:0x45e000
	y[0] = 90
	fmt.Printf("y: %v\n", y)  // y: [90 20 30 0 0]
}

func main() {
	x := [5]int{1, 2, 3}

	fmt.Printf("&x:%p\n", &x) // &x:0x45e000

	passArray(&x)

	fmt.Printf("x: %v\n", x) // x: [90 20 30 0 0]
}

Since y is a pointer to array of [5]int, we access the address without &, and we can see that x and y now have the same address. Changing any value in y will affect x.

What is slice?

In this section I will not discuss how to create slice, zero value of slice, growing slice, since most of article in Effective Go, and others tutorial already explain it in detail. Instead, we will do a reverse learning, or learning by doing.

A slice is a descriptor of an array segment. It consists of a pointer to the array, the length of the segment, and its capacity (the maximum length of the segment).

An attempt to rewrite slice with struct will result in linked-list.

type Slice struct {
    Len int
    Cap int
    elements *Element
}

Type Element struct {
    next *Element
}

But instead of linked-list, Go internal use an array in *Element, or what we will call as backing storage of slice.

The backing storage

To test this hypothesis let break then down one by one, first by printing their address.

func passSlice(xx []int) {
	fmt.Printf("xx> &xx:%p &xx[0]:%p\n", &xx, &xx[0])
	fmt.Printf("xx> len:%d cap:%d\n", len(xx), cap(xx))
}

func main() {
	x := []int{1, 2, 3}

	fmt.Printf("x> &x:%p &x[0]:%p\n", &x, &x[0])
	fmt.Printf("x> len:%d cap:%d\n", len(x), cap(x))

	passSlice(x)
}

It will print the following output,

x> &x:0x40a0e0 &x[0]:0x40e020
x> len:3 cap:3
xx> &xx:0x40a0f0 &xx[0]:0x40e020
xx> len:3 cap:3

Did you see the different? Go create new slice when passing it to function, but the backing storage is point to the same address. The address of x is different with xx, x and xx is different instance of slice with the same type. The address of first value of x and xx are same 0x40e020, that means x and xx share the same backing storage.

Of course both have the same length and capacity.

What would happen if we change the content of slice passed in function?

func sliceModifyByIndex(xx []int) {
	xx[0] = 0
}

func main() {
	x := []int{1, 2, 3}

	sliceModifyByIndex(x)

	fmt.Printf("%v\n", x)
}

Output,

[0 2 3]

This confirm our hypothesis that slice share the same backing storage.

But wait, things are become more interesting. What would happen if we append a value to slice in function?

func sliceAppend(xx []int) {
	xx = append(xx, 4)
}

func main() {
	x := []int{1, 2, 3}

	sliceAppend(x)

	fmt.Printf("%v\n", x)
}

If you thought the output would be [1 2 3 4] you are wrong. The program will print [1 2 3].

What happened? Let see their addresses.

func sliceAppendAddress(xx []int) {
	fmt.Printf("xx before > &[0]:%p len:%d cap:%d\n", &xx[0], len(xx), cap(xx))
	xx = append(xx, 4)
	fmt.Printf("xx after  > &[0]:%p len:%d cap:%d\n", &xx[0], len(xx), cap(xx))
}

func main() {
	x := []int{1, 2, 3}

	fmt.Printf("x before  > &[0]:%p len:%d cap:%d\n", &x[0], len(x), cap(x))
	sliceAppendAddress(x)
	fmt.Printf("x after   > &[0]:%p len:%d cap:%d\n", &x[0], len(x), cap(x))
}

// Output:
// x before  > &[0]:0x40e020 len:3 cap:3
// xx before > &[0]:0x40e020 len:3 cap:3
// xx after  > &[0]:0x456020 len:4 cap:8
// x after   > &[0]:0x40e020 len:3 cap:3

Before we append the slice xx the length and capability of xx and x are same: 3. After we append new value to xx the backing storage of xx is changed, but the change does not affect the x. This means the backing storage of xx after append is different with backing storage of x. We can see this from the address of first value of xx that change from 0x40e020 to 0x456020, but the address of first value of x still the same as before and after the function is called.

So, the address change because in previous exercise the slice xx does not have enough capacity to add new item to backing storage.

What if we provide enough capacity and pass to function and let function append the slice again, there will be no allocation and backing storage still reference the same right?

func sliceAppend(xx []int) {
	fmt.Printf("xx before > len:%d cap:%d\n", len(xx), cap(xx))
	xx = append(xx, 4)
	fmt.Printf("xx after  > len:%d cap:%d\n", len(xx), cap(xx))
}

func main() {
	x := make([]int, 0, 5)
	x = append(x, 1, 2, 3) // [1 2 3]

	fmt.Printf("x before  > len:%d cap:%d\n", len(x), cap(x))
	sliceAppend(x)
	fmt.Printf("x after   > %v\n", x)
}

// Output:
// x before  > len:3 cap:5
// xx before > len:3 cap:5
// xx after  > len:4 cap:5
// x after   > [1 2 3]

Why? Remember that the variable xx in function is different with x at the outside. The length of xx is growing with append, but the length of x does not change.

Since the backing storage of xx is not reallocated (the capacity is large enough for new item), does the address of backing storage in xx change? Or equal with x?

func sliceAppend(xx []int) {
	fmt.Printf("&xx[0] before:%p\n", &xx[0])
	xx = append(xx, 4)
	fmt.Printf("&xx[0] after :%p\n", &xx[0])
}

func main() {
	x := make([]int, 0, 5)
	x = append(x, 1, 2, 3)

	fmt.Printf("&x[0] before :%p\n", &x[0])
	sliceAppendAddress(x)
	fmt.Printf("&x[0] after  :%p\n", &x[0])
}

// Output:
// &x[0] before :0x456000
// &xx[0] before:0x456000
// &xx[0] after :0x456000
// &x[0] after  :0x456000

So both the address of x and xx does not change. But how come when we print x its output is [1 2 3] not [1 2 3 4]? Because printing x limited by their length.

To see the new value 4 in x, we can extend its length using x = x[:4],

func sliceAppend(xx []int) {
	fmt.Printf("xx before > len:%d cap:%d\n", len(xx), cap(xx))
	xx = append(xx, 4)
	fmt.Printf("xx after  > len:%d cap:%d\n", len(xx), cap(xx))
}

func main() {
	x := make([]int, 0, 5)
	x = append(x, 1, 2, 3) // [1 2 3]

	fmt.Printf("x before  > len:%d cap:%d\n", len(x), cap(x))
	sliceAppend(x)
	x = x[:4]
	fmt.Printf("x after   > %v\n", x)
	fmt.Printf("x after   > len:%d cap:%d\n", len(x), cap(x))
}

// Output:
// x before  > len:3 cap:5
// xx before > len:3 cap:5
// xx after  > len:4 cap:5
// x after   > [1 2 3 4]
// x after   > len:4 cap:5

What we have learned?

  • Slice passed by value

  • Unless the address in backing storage is not changed, the slice receiver can change the content of its referenced

  • If the address in backing storage changed, both receiver and caller/assigner will have different backing storage

Slicing slice

Now, that we know how the slice works when passing to function, we got to the second point of relearning slices: slicing slice.

What is the output of this snippet?

s := []int{1, 2, 3, 4, 5, 6, 7}
fmt.Printf("s > len:%d cap:%d\n", len(s), cap(s))

ss := s[2:4]
fmt.Printf("ss> len:%d cap:%d\n", len(ss), cap(ss))

// s > len:6 cap:7
// ss> len:? Cap:?
// A. 2 2
// B. 2 5
// C. 2 7

In the simplest form, the input for slicing a slice is

	T [ low : high )

which return the new slice, or the sub-slice.

The sub-slice will contains parent elements start from low index and end with high index, exclusive. The length of sub-slice is set to high - low. The capability of sub-slice is set to cap(T) - low. If low is not defined, it will be default to 0. If high is not defined, it will be default to len(T).

If your answer to previous exercise is B. 2 5, you are correct.

In the full form, the input for slicing a slice have third parameter, max,

	[ low : high : max )

This syntax only applicable for an array, pointer to array, or slice; but not a string.

The result, sub-slice, is the same with simple form, but the sub-slice will have capacity set to max - low.

Address of sub-slice

The next question is what is the address of sub-slice?

s := []int{10, 20, 30, 40, 50, 60, 70}
fmt.Printf(" &s:%p  &s[2]:%p\n", &s, &s[2])

ss := s[2:4]
fmt.Printf("&ss:%p &ss[0]:%p\n", &ss, &ss[0])

// &s :0xc00000a0a0  &s[2]:0xc000018250
// &ss:0xc00000a0c0 &ss[0]:0xc000018250

The sub-slice ss created by slicing s start from index 2 until 4 (values [30 40]). The address of index 2 in slice s is 0xc000018250 which is equal to the address of first value (index 0) in ss, 0xc000018250. This means that slice and its sub-slice share the same backing storage.

To prove this lets change the content of sub slice,

s := []int{10, 20, 30, 40, 50, 60, 70}
ss := s[2:4]
ss[0] = 80

fmt.Printf("s :%v\n", s)

// s :[10 20 80 40 50 60 70]

When we change the value of index 0 in sub-slice ss to 80, the value in the slice s at index 2 (which point to the same address) is also change.

Appending to sub-slice

In previous section we said that "slice and sub-slice share the same backing storage", is it always true?

In the following example, we will append the new value to sub-slice and print the result of both original slice and its sub-slice.

s := []int{10, 20, 30, 40, 50, 60, 70}
ss := s[2:4]

ss = append(ss, 80)

fmt.Printf("ss after :%v\n", ss)
fmt.Printf("s after  :%v\n", s)

// ss after :[30 40 80]
// s after  :[10 20 30 40 80 60 70]

Surprise?

This is what happened,

func main() {
	s := []int{10, 20, 30, 40, 50, 60, 70}
	ss := s[2:4]

	fmt.Printf("s  before  &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s)
	fmt.Printf("ss before &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss)

	ss = append(ss, 80)

	fmt.Printf("s  after  &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s)
	fmt.Printf("ss after &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss)
}

// Output:
// s  before  &s[2]:0x45e008 len:7 cap:7 values:[10 20 30 40 50 60 70]
// ss before &ss[0]:0x45e008 len:2 cap:5 values:[30 40]
// s  after  &s[2]:0x45e008 len:7 cap:7 values:[10 20 30 40 80 60 70]
// ss after &ss[0]:0x45e008 len:3 cap:5 values:[30 40 80]

The length of sub-slice ss is 2 and its capability is 5, so append only write the appended value 80 into index 2 (the length) and increase the len to len+1 because the sub-slice ss have enough backing storage for new item.

What would happened if we grow the sub-slice beyond its capacity?

func main() {
	s := []int{10, 20, 30, 40, 50, 60, 70}
	ss := s[2:4]

	fmt.Printf("s  before  &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s)
	fmt.Printf("ss before &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss)

	ss = append(ss, 80, 90, 100, 110)

	fmt.Printf("s  after   &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s)
	fmt.Printf("ss after  &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss)
}

// Output:
// s  before  &s[2]:0x456008 len:7 cap:7 values:[10 20 30 40 50 60 70]
// ss before &ss[0]:0x456008 len:2 cap:5 values:[30 40]
// s  after   &s[2]:0x456008 len:7 cap:7 values:[10 20 30 40 50 60 70]
// ss after  &ss[0]:0x454030 len:6 cap:12 values:[30 40 80 90 100 110]

Once the slice is growing beyond their capacity, Go will reallocated new backing storage, copy the old value to new backing stroage, and update the backing storage of ss to new one. The sub-slice ss now use new backing storage, different with s.

What we have learned?

  • Sub-slice initial element address is pointer to their original slice

  • Unless the backing storage is not changed, the sub-slice can change the content that its referenced

  • If the backing storage in sub-slice changed, both original slice and fmt.Println(slice) sub-slice will have different backing storage

Zeroing slice

What is the best way to reset slice to zero? Is it s = nil or s = [:0]?

First, lets look how nil behave.

s := []int{1, 2, 3}
fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0])

s = nil
fmt.Printf("s> len:%d cap:%d\n", len(s), cap(s))

s = append(s, 4)
fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0])

// Output:
// s> len:3 cap:3 &[0]:0xc0000ac040
// s> len:0 cap:0
// s> len:1 cap:1 &[0]:0xc00007e0e8

The first backing storage of s have and address at 0xc0000ac040, and after we nil it and append new item, the backing storage change. So, this means nil-ing a slice will release the previous backing storage and create new backing storage when we append new item.

Second, we look how sub-slicing with cap 0.

s := []int{1, 2, 3}
fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0])

s = s[:0]
fmt.Printf("s> len:%d cap:%d\n", len(s), cap(s))

s = append(s, 4)
fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0])
fmt.Printf(“s> %v\n”, s)

// s> len:3 cap:3 &[0]:0xc0000144c0
// s> len:0 cap:3
// s> len:1 cap:3 &[0]:0xc0000144c0
// s> [4]

Zeroing slice using [:0] reset the length to zero and keep and backing storage.

Concolusion

The answer to above question is depends on how you use the slice,

  • use nil if we want to release slice's (and its backing storage),

  • use [:0] if we want to keep the slice backing storage, to minimize reallocation.

Range on slice

The next questions regarding slice is when we do for-range loop on slice, does the second variable returned by range is a copy of item or a pointer to an item? Can we change the value inside for-range loop?

	slice := []int{1, 2, 3}
	for _, item := range slice {
		item += 1
	}
	fmt.Println(slice)

	// Output:
	// [1 2 3]

Let see their address,

	slice := []int{1, 2, 3}
	for x, item := range slice {
		fmt.Printf("&slice[%d]:%p &item:%p\n", x, &slice[x], &item)
		item += 1
	}

	// Output:
	// &slice[0]:0x40e020 &item:0x40e02c
	// &slice[1]:0x40e024 &item:0x40e02c
	// &slice[2]:0x40e028 &item:0x40e02c

Looking at the address, the item is a scope variable inside loop. Which means, each iteration in for-range loop, it will copy the value in slice into item. Changing the value of item will not change the value in slice.

Only when the slice type is pointer, we can change the value using for-range loop,

	a := int(1)
	b := int(2)
	c := int(3)
	slice := []*int{&a, &b, &c}
	for x, item := range slice {
		fmt.Printf("&slice[%d]:%p &item:%p\n", x, &slice[x], &item)
		*item += 1
	}

	fmt.Printf("Value of slice:")
	for x := range slice {
		fmt.Printf(" slice[%d]:%d", x, *slice[x])
	}
	fmt.Println()
	fmt.Printf("Value of a:%d, b:%d, c:%d\n", a, b, c)

I think this is obvious for someone who familiar with pointer.

Slice gotchas

In this section we look what are commons mistake that we do when using slice.

Too much reallocation

Calling multiple append() on slices values may cause memory re-allocation.

In this example we print the length and capability of slice before and appending the slice.

func doX(in []int) (out []int){
    for _, v := range in {
    	fmt.Printf("before> out len:%d cap:%d\n", len(out), cap(out))
        out = append(out, v)
    	fmt.Printf("after > out len:%d cap:%d\n", len(out), cap(out))
    }
    return out
}

doX([]int{1,2,3,4,5})

We found that doX do 4 reallocation to slice out,

// Output: 4 re-allocation
before> out len:0 cap:0
after > out len:1 cap:1
before> out len:1 cap:1
after > out len:2 cap:2
before> out len:2 cap:2
after > out len:3 cap:4
before> out len:3 cap:4
after > out len:4 cap:4
before> out len:4 cap:4
after > out len:5 cap:8

The slice out backing storage growth from 0 to 1, 2, 4, and 8.

To minimize this we can allocate the capability to the possible maximum values that we may know. Since we know that out will at least take all length of in, we can allocate the initialize storage to len(in),

func doX(in []int) (out []int){
    out = make([]int, 0, len(in))
    for _, v := range in {
        out = append(out, v)
    }
    return out
}
doX([]int{1,2,3,4,5})

The allocation now decreased to 1 (on initial make),

// Output: 1 allocation
before> out len:0 cap:5
after > out len:1 cap:5
before> out len:1 cap:5
after > out len:2 cap:5
before> out len:2 cap:5
after > out len:3 cap:5
before> out len:3 cap:5
after > out len:4 cap:5
before> out len:4 cap:5
after > out len:5 cap:5

The good news is we have static analysis tool for that: prealloc.

Unreleased memory allocation

A quote from go blog,

re-slicing a slice doesn't make a copy of the underlying array. The full array will be kept in memory until it is no longer referenced. Occasionally this can cause the program to hold all the data in memory when only a small piece of it is needed.
— Andrew Gerrand
The Go Blog - Go Slices: usage and internals

Given the following slicing statement,

msg.id = packet[0:4]

Memory allocated by packet will not released until msg.id get nil-ed or msg itself has no reference.

I avoid the term “memory leak” here, because technically part of the memory content is still in use, but not whole of it. The term “memory leak” is when we allocated it but forgot to free-it.

Just like reading Term of Services, sometimes we skip reading the content and looking only how to do X while forgot the internal detail.

Bad news is, AFAIK, there is no static analysis tool to help us with it. Your best friend right now is pprof.

Taking advantages of gotchas

If we knew that the original slices is cacheable or reusable, we can take advantage of it to minimize memory usage.

Case example, assume that we have a cacheable packet, that need to be parsed, checked, and validated; we can reuse the content by sub-slicing it.

Assume that a packet is sequences of characters with the following format,

key:value

we create a struct to store the key and value,

type Field struct {
    Key []byte
    Value []byte
}

Common approaches when parsing it is by appending it one by one, and each Field’s key and name will allocated new slices.

field := Field{}
packet := []byte("key:value")

for _, c := range packet {
	if c == ':' {
		break
	}
	field.Key = append(field.Key, c)
}
field.Value = append(field.Value, packet[len(field.Key)+1:]...)

fmt.Printf("Key: %s len:%d cap:%d\n", field.Key, len(field.Key), cap(field.Key))
fmt.Printf("Value: %s len:%d cap:%d\n", field.Value, len(field.Value), cap(field.Value))

fmt.Printf("packet storage: %p\n", &packet[0])
fmt.Printf("field.Key storage: %p\n", &field.Key[0])
fmt.Printf("field.Value storage: %p\n", &field.Value[0])

// Key: key len:3 cap:8
// Value: value len:5 cap:8
// packet storage: 0x40e020
// field.Key storage: 0x40e030
// field.Value storage: 0x40e038

In this approach the backing storage for packet, Key, and Value are different, and we have 4 * 2 re-allocation.

An alternative to minimize memory allocation is to use the original backing array and point the Field Key and Value into it.

var x := 0
// Get the beginning and end of index key
for ; x < len(packet); x++ {
	if packet[x] == ':'
		break
	}
}

field.Key = packet[:x]
field.Value = packet[x+1:]

With this approach all of the slice use single backing storage.

slices_gotcha_subslicing_original

Remember, using this approach require a careful attention on where the instance of Field go and released.

References

[EFF-GO] The Go Authors, “Effective Go”, https://golang.org/doc/effective_go.html, February 2019.

[GO-SPEC] The Go Authors, “The Go Programming Language Specification”, https://golang.org/ref/spec, May 2018.

[GO-BLOG] Gerrand, Andrew, “The Go Blog - Go Slices: usage and internals”, https://blog.golang.org/go-slices-usage-and-internals, January 2011.