This article show how the slice works in Go language in practice, by using code examples which you can try by yourself.
The original source of this article is the following slide that I submit to GopherCon Singapore 2019,
Overview
Most of the time we forgot or did not know how actually the slices works. Most of the time we just assume that slices is dynamic array. By relearning how slices works, we can possible avoid slice’s “gotcha” or take advantage of it. In this article we will review back how the slices works and maybe by knowing how it work we can write a better program.
The content of this article is not new, if you already read and understand Effective Go, The Go Programming Language Specification, and The Go Blog - Go Slices: usage and internals, most of the topics in here already discussed there. This article try to emphasis on code examples, learning by doing, step by step.
Back to the basic: What is an array?
An array is sequence of value of single type with fixed size.
There are two elements needed to create an array: size and type. The size is defined inside the square bracket and the type is defined after the closing bracket.
ArrayType = "[" ArrayLength "]" ElementType . ArrayLength = Expression . ElementType = Type .
Array size is part of its type
One of the first properties of array is the size of an array is part of its type.
x := [5]int{1, 2, 3} y := [5]int{3, 2, 1} z := [5]int{1, 2, 3} fmt.Printf("x == y: %v\n", x == y) // false fmt.Printf("x == z: %v\n", x == z) // true
In the above snippet, x
, y
, and z
are array with size 5 and int
as its
type.
Array x
have the same size with y
but does not contains the same order of
value, so if we compare them it will return false
.
Array x
have the same size with z
and contains the same value in the same
order, so if we compare them it will return true
.
Let say that we have another array a
with different size from x
,
a := [4]int{1, 2, 3}
If we try to compare them
fmt.Printf("x == a: %v\n", x == a)
the compiler will thrown an error,
[...] invalid operation: x == a (mismatched types [5]int and [4]int)
Out of curiosity, what would happen if we access address of array out of its range?
fmt.Printf("out of bounds> &in[5]:%p\n", &in[5])
The compiler will thrown an error at compile time,
./main_test.go:14:66: invalid array index 5 (out of bounds for 5-element array)
Arrays elements are zeroed
Once the array is created, all of its elements are set to its zero value.
in := [5]int{10, 20, 30} fmt.Printf("contents > in:%v\n", in) // Output: // contents > in:[10 20 30 0 0]
In the above snippet, in
is an array of int
with size set to 5.
The values for the first three elements at index 0, 1, 2 are set to 10, 20, 30
and the rest of values will be set to 0 by the compiler.
Array are values
Assigning or passing an array to another array will copy all of its values.
To understand this let see the address of instance of array and its first value.
in := [5]int{1, 2, 3} fmt.Printf("&in:%p &in[0]:%p &in[1]:%p\n", &in, &in[0], &in[1]) // Output: // &in:0xc0000122a0 &in[0]:0xc0000122a0 &in[1]:0xc0000122a8
(Note: The %p
format print the address of value in memory, and it will be
different on each system.)
The address of array in
can be accessed using &in
, and we got
0xc0000122a0
.
The address of first value can be accessed using &in[0]
, and we got
0xc0000122a0
.
This means the address of instance of array and address of its first element
is equal.
Now, what would happen if we pass an array to function?
func passArray(y [5]int) { fmt.Printf("&y:%p\n", &y) // &y:0x45e020 y[0] = 90 fmt.Printf("y: %v\n", y) // y: [90 20 30 0 0] } func main() { x := [5]int{10, 20, 30} fmt.Printf("&x:%p\n", &x) // &x:0x45e000 passArray(x) fmt.Printf("x: %v\n", x) // x: [10 20 30 0 0] }
Array x
is created with address is 0xc00007a0f0
.
When we pass x
to passArray()
function, x
values is copied to y
.
Array y
is in different address 0xc00007a120
but have the same size and
values as x
.
We can test this by changing the first value of y
to 9
and print the x
after call to passArray
, the values in x
does not affected by assignment
in passArray()
function.
So, x
and y
are two different arrays with the same size and values (on
initial pass).
If we want passArray()
function to be able to change the value that it
received in y
and the changes affected in x
, we can pass x
by using
address and receive them by using pointer in y
.
func passArray(y *[5]int) { fmt.Printf("y:%p\n", y) // &y:0x45e000 y[0] = 90 fmt.Printf("y: %v\n", y) // y: [90 20 30 0 0] } func main() { x := [5]int{1, 2, 3} fmt.Printf("&x:%p\n", &x) // &x:0x45e000 passArray(&x) fmt.Printf("x: %v\n", x) // x: [90 20 30 0 0] }
Since y
is a pointer to array of [5]int
, we access the address without
&
, and we can see that x
and y
now have the same address.
Changing any value in y
will affect x
.
What is slice?
In this section I will not discuss how to create slice, zero value of slice, growing slice, since most of article in Effective Go, and others tutorial already explain it in detail. Instead, we will do a reverse learning, or learning by doing.
From the go blog The Go Blog - Go Slices: usage and internals,
A slice is a descriptor of an array segment. It consists of a pointer to the array, the length of the segment, and its capacity (the maximum length of the segment).
An attempt to rewrite slice with struct will result in linked-list.
type Slice struct { Len int Cap int elements *Element } Type Element struct { next *Element }
But instead of linked-list, Go internal use an array in *Element
, or what we
will call as backing storage of slice.
The backing storage
To test this hypothesis let break then down one by one, first by printing their address.
func passSlice(xx []int) { fmt.Printf("xx> &xx:%p &xx[0]:%p\n", &xx, &xx[0]) fmt.Printf("xx> len:%d cap:%d\n", len(xx), cap(xx)) } func main() { x := []int{1, 2, 3} fmt.Printf("x> &x:%p &x[0]:%p\n", &x, &x[0]) fmt.Printf("x> len:%d cap:%d\n", len(x), cap(x)) passSlice(x) }
It will print the following output,
x> &x:0x40a0e0 &x[0]:0x40e020 x> len:3 cap:3 xx> &xx:0x40a0f0 &xx[0]:0x40e020 xx> len:3 cap:3
Did you see the different?
Go create new slice when passing it to function, but the backing storage is
point to the same address.
The address of x
is different with xx
, x
and xx
is different instance
of slice with the same type.
The address of first value of x
and xx
are same 0x40e020
, that means x
and xx
share the same backing storage.
Of course both have the same length and capacity.
What would happen if we change the content of slice passed in function?
func sliceModifyByIndex(xx []int) { xx[0] = 0 } func main() { x := []int{1, 2, 3} sliceModifyByIndex(x) fmt.Printf("%v\n", x) }
Output,
[0 2 3]
This confirm our hypothesis that slice share the same backing storage.
But wait, things are become more interesting. What would happen if we append a value to slice in function?
func sliceAppend(xx []int) { xx = append(xx, 4) } func main() { x := []int{1, 2, 3} sliceAppend(x) fmt.Printf("%v\n", x) }
If you thought the output would be [1 2 3 4]
you are wrong.
The program will print [1 2 3]
.
What happened? Let see their addresses.
func sliceAppendAddress(xx []int) { fmt.Printf("xx before > &[0]:%p len:%d cap:%d\n", &xx[0], len(xx), cap(xx)) xx = append(xx, 4) fmt.Printf("xx after > &[0]:%p len:%d cap:%d\n", &xx[0], len(xx), cap(xx)) } func main() { x := []int{1, 2, 3} fmt.Printf("x before > &[0]:%p len:%d cap:%d\n", &x[0], len(x), cap(x)) sliceAppendAddress(x) fmt.Printf("x after > &[0]:%p len:%d cap:%d\n", &x[0], len(x), cap(x)) } // Output: // x before > &[0]:0x40e020 len:3 cap:3 // xx before > &[0]:0x40e020 len:3 cap:3 // xx after > &[0]:0x456020 len:4 cap:8 // x after > &[0]:0x40e020 len:3 cap:3
Before we append the slice xx
the length and capability of xx
and x
are
same: 3.
After we append new value to xx
the backing storage of xx
is changed, but
the change does not affect the x
.
This means the backing storage of xx
after append is different with backing
storage of x
.
We can see this from the address of first value of xx
that change from
0x40e020
to 0x456020
, but the address of first value of x
still the same
as before and after the function is called.
So, the address change because in previous exercise the slice xx
does not
have enough capacity to add new item to backing storage.
What if we provide enough capacity and pass to function and let function append the slice again, there will be no allocation and backing storage still reference the same right?
func sliceAppend(xx []int) { fmt.Printf("xx before > len:%d cap:%d\n", len(xx), cap(xx)) xx = append(xx, 4) fmt.Printf("xx after > len:%d cap:%d\n", len(xx), cap(xx)) } func main() { x := make([]int, 0, 5) x = append(x, 1, 2, 3) // [1 2 3] fmt.Printf("x before > len:%d cap:%d\n", len(x), cap(x)) sliceAppend(x) fmt.Printf("x after > %v\n", x) } // Output: // x before > len:3 cap:5 // xx before > len:3 cap:5 // xx after > len:4 cap:5 // x after > [1 2 3]
Why?
Remember that the variable xx
in function is different with x
at the
outside.
The length of xx
is growing with append, but the length of x
does not
change.
Since the backing storage of xx
is not reallocated (the capacity is large
enough for new item), does the address of backing storage in xx
change?
Or equal with x
?
func sliceAppend(xx []int) { fmt.Printf("&xx[0] before:%p\n", &xx[0]) xx = append(xx, 4) fmt.Printf("&xx[0] after :%p\n", &xx[0]) } func main() { x := make([]int, 0, 5) x = append(x, 1, 2, 3) fmt.Printf("&x[0] before :%p\n", &x[0]) sliceAppendAddress(x) fmt.Printf("&x[0] after :%p\n", &x[0]) } // Output: // &x[0] before :0x456000 // &xx[0] before:0x456000 // &xx[0] after :0x456000 // &x[0] after :0x456000
So both the address of x
and xx
does not change.
But how come when we print x
its output is [1 2 3]
not [1 2 3 4]
?
Because printing x
limited by their length.
To see the new value 4
in x
, we can extend its length using x = x[:4]
,
func sliceAppend(xx []int) { fmt.Printf("xx before > len:%d cap:%d\n", len(xx), cap(xx)) xx = append(xx, 4) fmt.Printf("xx after > len:%d cap:%d\n", len(xx), cap(xx)) } func main() { x := make([]int, 0, 5) x = append(x, 1, 2, 3) // [1 2 3] fmt.Printf("x before > len:%d cap:%d\n", len(x), cap(x)) sliceAppend(x) x = x[:4] fmt.Printf("x after > %v\n", x) fmt.Printf("x after > len:%d cap:%d\n", len(x), cap(x)) } // Output: // x before > len:3 cap:5 // xx before > len:3 cap:5 // xx after > len:4 cap:5 // x after > [1 2 3 4] // x after > len:4 cap:5
What we have learned?
-
Slice passed by value
-
Unless the address in backing storage is not changed, the slice receiver can change the content of its referenced
-
If the address in backing storage changed, both receiver and caller/assigner will have different backing storage
Slicing slice
Now, that we know how the slice works when passing to function, we got to the second point of relearning slices: slicing slice.
What is the output of this snippet?
s := []int{1, 2, 3, 4, 5, 6, 7} fmt.Printf("s > len:%d cap:%d\n", len(s), cap(s)) ss := s[2:4] fmt.Printf("ss> len:%d cap:%d\n", len(ss), cap(ss)) // s > len:6 cap:7 // ss> len:? Cap:? // A. 2 2 // B. 2 5 // C. 2 7
In the simplest form, the input for slicing a slice is
T [ low : high )
which return the new slice, or the sub-slice.
The sub-slice will contains parent elements start from low
index and end with high
index, exclusive.
The length of sub-slice is set to high - low
.
The capability of sub-slice is set to cap(T) - low
.
If low
is not defined, it will be default to 0
.
If high
is not defined, it will be default to len(T)
.
If your answer to previous exercise is B. 2 5
, you are correct.
In the full form, the input for slicing a slice have third parameter, max
,
[ low : high : max )
This syntax only applicable for an array, pointer to array, or slice; but not a string.
The result, sub-slice, is the same with simple form, but the sub-slice will
have capacity set to max - low
.
Address of sub-slice
The next question is what is the address of sub-slice?
s := []int{10, 20, 30, 40, 50, 60, 70} fmt.Printf(" &s:%p &s[2]:%p\n", &s, &s[2]) ss := s[2:4] fmt.Printf("&ss:%p &ss[0]:%p\n", &ss, &ss[0]) // &s :0xc00000a0a0 &s[2]:0xc000018250 // &ss:0xc00000a0c0 &ss[0]:0xc000018250
The sub-slice ss
created by slicing s
start from index 2
until 4
(values [30 40]
).
The address of index 2 in slice s
is 0xc000018250
which is equal to the
address of first value (index 0) in ss
, 0xc000018250
.
This means that slice and its sub-slice share the same backing storage.
To prove this lets change the content of sub slice,
s := []int{10, 20, 30, 40, 50, 60, 70} ss := s[2:4] ss[0] = 80 fmt.Printf("s :%v\n", s) // s :[10 20 80 40 50 60 70]
When we change the value of index 0 in sub-slice ss
to 80, the value in the
slice s
at index 2 (which point to the same address) is also change.
Appending to sub-slice
In previous section we said that "slice and sub-slice share the same backing storage", is it always true?
In the following example, we will append the new value to sub-slice and print the result of both original slice and its sub-slice.
s := []int{10, 20, 30, 40, 50, 60, 70} ss := s[2:4] ss = append(ss, 80) fmt.Printf("ss after :%v\n", ss) fmt.Printf("s after :%v\n", s) // ss after :[30 40 80] // s after :[10 20 30 40 80 60 70]
Surprise?
This is what happened,
func main() { s := []int{10, 20, 30, 40, 50, 60, 70} ss := s[2:4] fmt.Printf("s before &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s) fmt.Printf("ss before &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss) ss = append(ss, 80) fmt.Printf("s after &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s) fmt.Printf("ss after &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss) } // Output: // s before &s[2]:0x45e008 len:7 cap:7 values:[10 20 30 40 50 60 70] // ss before &ss[0]:0x45e008 len:2 cap:5 values:[30 40] // s after &s[2]:0x45e008 len:7 cap:7 values:[10 20 30 40 80 60 70] // ss after &ss[0]:0x45e008 len:3 cap:5 values:[30 40 80]
The length of sub-slice ss
is 2 and its capability is 5, so append only
write the appended value 80
into index 2 (the length) and increase the len
to len+1
because the sub-slice ss
have enough backing storage for new
item.
What would happened if we grow the sub-slice beyond its capacity?
func main() { s := []int{10, 20, 30, 40, 50, 60, 70} ss := s[2:4] fmt.Printf("s before &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s) fmt.Printf("ss before &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss) ss = append(ss, 80, 90, 100, 110) fmt.Printf("s after &s[2]:%p len:%d cap:%d values:%v\n", &s[2], len(s), cap(s), s) fmt.Printf("ss after &ss[0]:%p len:%d cap:%d values:%v\n", &ss[0], len(ss), cap(ss), ss) } // Output: // s before &s[2]:0x456008 len:7 cap:7 values:[10 20 30 40 50 60 70] // ss before &ss[0]:0x456008 len:2 cap:5 values:[30 40] // s after &s[2]:0x456008 len:7 cap:7 values:[10 20 30 40 50 60 70] // ss after &ss[0]:0x454030 len:6 cap:12 values:[30 40 80 90 100 110]
Once the slice is growing beyond their capacity, Go will reallocated new
backing storage, copy the old value to new backing stroage, and update the
backing storage of ss
to new one.
The sub-slice ss
now use new backing storage, different with s
.
What we have learned?
-
Sub-slice initial element address is pointer to their original slice
-
Unless the backing storage is not changed, the sub-slice can change the content that its referenced
-
If the backing storage in sub-slice changed, both original slice and fmt.Println(slice) sub-slice will have different backing storage
Zeroing slice
What is the best way to reset slice to zero?
Is it s = nil
or s = [:0]
?
First, lets look how nil behave.
s := []int{1, 2, 3} fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0]) s = nil fmt.Printf("s> len:%d cap:%d\n", len(s), cap(s)) s = append(s, 4) fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0]) // Output: // s> len:3 cap:3 &[0]:0xc0000ac040 // s> len:0 cap:0 // s> len:1 cap:1 &[0]:0xc00007e0e8
The first backing storage of s
have and address at 0xc0000ac040
, and after
we nil it and append new item, the backing storage change.
So, this means nil-ing a slice will release the previous backing storage and
create new backing storage when we append new item.
Second, we look how sub-slicing with cap 0.
s := []int{1, 2, 3} fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0]) s = s[:0] fmt.Printf("s> len:%d cap:%d\n", len(s), cap(s)) s = append(s, 4) fmt.Printf("s> len:%d cap:%d &[0]:%p\n", len(s), cap(s), &s[0]) fmt.Printf(“s> %v\n”, s) // s> len:3 cap:3 &[0]:0xc0000144c0 // s> len:0 cap:3 // s> len:1 cap:3 &[0]:0xc0000144c0 // s> [4]
Zeroing slice using [:0]
reset the length to zero and keep and backing
storage.
Concolusion
The answer to above question is depends on how you use the slice,
-
use nil if we want to release slice’s (and its backing storage),
-
use
[:0]
if we want to keep the slice backing storage, to minimize reallocation. === Range on slice
The next questions regarding slice is when we do for-range loop on slice, does the second variable returned by range is a copy of item or a pointer to an item? Can we change the value inside for-range loop?
slice := []int{1, 2, 3} for _, item := range slice { item += 1 } fmt.Println(slice) // Output: // [1 2 3]
Let see their address,
slice := []int{1, 2, 3} for x, item := range slice { fmt.Printf("&slice[%d]:%p &item:%p\n", x, &slice[x], &item) item += 1 } // Output: // &slice[0]:0x40e020 &item:0x40e02c // &slice[1]:0x40e024 &item:0x40e02c // &slice[2]:0x40e028 &item:0x40e02c
Looking at the address, the item is a scope variable inside loop. Which means, each iteration in for-range loop, it will copy the value in slice into item. Changing the value of item will not change the value in slice.
Only when the slice type is pointer, we can change the value using for-range loop,
a := int(1) b := int(2) c := int(3) slice := []*int{&a, &b, &c} for x, item := range slice { fmt.Printf("&slice[%d]:%p &item:%p\n", x, &slice[x], &item) *item += 1 } fmt.Printf("Value of slice:") for x := range slice { fmt.Printf(" slice[%d]:%d", x, *slice[x]) } fmt.Println() fmt.Printf("Value of a:%d, b:%d, c:%d\n", a, b, c)
I think this is obvious for someone who familiar with pointer.
Slice gotchas
In this section we look what are commons mistake that we do when using slice.
Too much reallocation
Calling multiple append() on slices values may cause memory re-allocation.
In this example we print the length and capability of slice before and appending the slice.
func doX(in []int) (out []int){ for _, v := range in { fmt.Printf("before> out len:%d cap:%d\n", len(out), cap(out)) out = append(out, v) fmt.Printf("after > out len:%d cap:%d\n", len(out), cap(out)) } return out } doX([]int{1,2,3,4,5})
We found that doX
do 4 reallocation to slice out
,
// Output: 4 re-allocation before> out len:0 cap:0 after > out len:1 cap:1 before> out len:1 cap:1 after > out len:2 cap:2 before> out len:2 cap:2 after > out len:3 cap:4 before> out len:3 cap:4 after > out len:4 cap:4 before> out len:4 cap:4 after > out len:5 cap:8
The slice out
backing storage growth from 0 to 1, 2, 4, and 8.
To minimize this we can allocate the capability to the possible maximum values
that we may know.
Since we know that out
will at least take all length of in
, we can
allocate the initialize storage to len(in)
,
func doX(in []int) (out []int){ out = make([]int, 0, len(in)) for _, v := range in { out = append(out, v) } return out } doX([]int{1,2,3,4,5})
The allocation now decreased to 1 (on initial make),
// Output: 1 allocation before> out len:0 cap:5 after > out len:1 cap:5 before> out len:1 cap:5 after > out len:2 cap:5 before> out len:2 cap:5 after > out len:3 cap:5 before> out len:3 cap:5 after > out len:4 cap:5 before> out len:4 cap:5 after > out len:5 cap:5
The good news is we have static analysis tool for that: prealloc.
Unreleased memory allocation
A quote from go blog,
re-slicing a slice doesn’t make a copy of the underlying array. The full array will be kept in memory until it is no longer referenced. Occasionally this can cause the program to hold all the data in memory when only a small piece of it is needed.
The Go Blog - Go Slices: usage and internals
Given the following slicing statement,
msg.id = packet[0:4]
Memory allocated by packet
will not released until msg.id
get nil-ed or
msg
itself has no reference.
I avoid the term “memory leak” here, because technically part of the memory content is still in use, but not whole of it. The term “memory leak” is when we allocated it but forgot to free-it.
Just like reading Term of Services, sometimes we skip reading the content and looking only how to do X while forgot the internal detail.
Bad news is, AFAIK, there is no static analysis tool to help us with it. Your best friend right now is pprof.
Taking advantages of gotchas
If we knew that the original slices is cacheable or reusable, we can take advantage of it to minimize memory usage.
Case example, assume that we have a cacheable packet, that need to be parsed, checked, and validated; we can reuse the content by sub-slicing it.
Assume that a packet is sequences of characters with the following format,
key:value
we create a struct to store the key and value,
type Field struct { Key []byte Value []byte }
Common approaches when parsing it is by appending it one by one, and each Field’s key and name will allocated new slices.
field := Field{} packet := []byte("key:value") for _, c := range packet { if c == ':' { break } field.Key = append(field.Key, c) } field.Value = append(field.Value, packet[len(field.Key)+1:]...) fmt.Printf("Key: %s len:%d cap:%d\n", field.Key, len(field.Key), cap(field.Key)) fmt.Printf("Value: %s len:%d cap:%d\n", field.Value, len(field.Value), cap(field.Value)) fmt.Printf("packet storage: %p\n", &packet[0]) fmt.Printf("field.Key storage: %p\n", &field.Key[0]) fmt.Printf("field.Value storage: %p\n", &field.Value[0]) // Key: key len:3 cap:8 // Value: value len:5 cap:8 // packet storage: 0x40e020 // field.Key storage: 0x40e030 // field.Value storage: 0x40e038
In this approach the backing storage for packet
, Key
, and Value
are
different, and we have 4 * 2 re-allocation.
An alternative to minimize memory allocation is to use the original backing array and point the Field Key and Value into it.
var x := 0 // Get the beginning and end of index key for ; x < len(packet); x++ { if packet[x] == ':' break } } field.Key = packet[:x] field.Value = packet[x+1:]
With this approach all of the slice use single backing storage.
Remember, using this approach require a careful attention on where the instance of Field go and released.
References
[EFF-GO] The Go Authors, “Effective Go”, https://golang.org/doc/effective_go.html, February 2019.
[GO-SPEC] The Go Authors, “The Go Programming Language Specification”, https://golang.org/ref/spec, May 2018.
[GO-BLOG] Gerrand, Andrew, “The Go Blog - Go Slices: usage and internals”, https://blog.golang.org/go-slices-usage-and-internals, January 2011.