LK4D4 Blog


    Go Benchmarks

    Benchmarks

    Benchmarks are tests for performance. It’s pretty useful to have them in project and compare results from commit to commit. Go has very good tooling for writing and executing benchmarks. In this article I’ll show how to use package testing for writing benchmarks.

    How to write benchmark

    It’s pretty easy in Go. Here is a simple benchmark:

    func BenchmarkSample(b *testing.B) {
        for i := 0; i < b.N; i++ {
            if x := fmt.Sprintf("%d", 42); x != "42" {
                b.Fatalf("Unexpected string: %s", x)
            }
        }
    }
    

    Save this code to bench_test.go and run go test -bench=. bench_test.go. You’ll see something like this:

    testing: warning: no tests to run
    PASS
    BenchmarkSample 10000000               206 ns/op
    ok      command-line-arguments  2.274s
    
    

    We see here that one iteration takes 206 nanoseconds. That was easy, indeed. There are couple of things more about benchmarks in Go, though.

    What you can benchmark?

    By default go test -bench=. tests only speed of your code, however you can add flag -benchmem, which will also test a memory consumption and an allocations count. It’ll look like:

    PASS
    BenchmarkSample 10000000               208 ns/op              32 B/op          2 allocs/op
    
    

    Here we have bytes per operation and allocations per operation. Pretty useful information as for me. You can also enable those reports per-benchmark with b.ReportAllocs() method. But that’s not all, you can also specify a throughput of one operation with b.SetBytes(n int64) method. For example:

    func BenchmarkSample(b *testing.B) {
        b.SetBytes(2)
        for i := 0; i < b.N; i++ {
            if x := fmt.Sprintf("%d", 42); x != "42" {
                b.Fatalf("Unexpected string: %s", x)
            }
        }
    }
    

    Now output will be:

    testing: warning: no tests to run
    PASS
    BenchmarkSample  5000000               324 ns/op           6.17 MB/s          32 B/op          2 allocs/op
    ok      command-line-arguments  1.999s
    

    You can see now throughput column, which is 6.17 MB/s in my case.

    Benchmark setup

    What if you need to prepare your operation for an each iteration? You definitely don’t want to include time of setup in a benchmark result. I wrote very simple Set datastructure for benchmarking:

    type Set struct {
        set map[interface{}]struct{}
        mu  sync.Mutex
    }
    
    func (s *Set) Add(x interface{}) {
        s.mu.Lock()
        s.set[x] = struct{}{}
        s.mu.Unlock()
    }
    
    func (s *Set) Delete(x interface{}) {
        s.mu.Lock()
        delete(s.set, x)
        s.mu.Unlock()
    }
    
    and benchmark for its Delete method:
    func BenchmarkSetDelete(b *testing.B) {
        var testSet []string
        for i := 0; i < 1024; i++ {
            testSet = append(testSet, strconv.Itoa(i))
        }
        for i := 0; i < b.N; i++ {
            b.StopTimer()
            set := Set{set: make(map[interface{}]struct{})}
            for _, elem := range testSet {
                set.Add(elem)
            }
            for _, elem := range testSet {
                set.Delete(elem)
            }
        }
    }
    

    Here we have couple of problems:

    • time and allocs of testSet creation included in first iteration (which isn’t big problem here, because there will be a lot of iterations).
    • time and allocs of Add to set included in each iteration

    For such cases we have b.ResetTimer(), b.StopTimer() and b.StartTimer(). Here those methods used in same benchmark:

    func BenchmarkSetDelete(b *testing.B) {
        var testSet []string
        for i := 0; i < 1024; i++ {
            testSet = append(testSet, strconv.Itoa(i))
        }
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            b.StopTimer()
            set := Set{set: make(map[interface{}]struct{})}
            for _, elem := range testSet {
                set.Add(elem)
            }
            b.StartTimer()
            for _, elem := range testSet {
                set.Delete(elem)
            }
        }
    }
    

    Now those initializations won’t be counted in benchmark results and we’ll see only results of Delete calls.

    Benchmarks comparison

    Of course there is nothing to do with benchmark if you can’t compare them on different code.

    Here is an example code of marshaling struct to json and benchhmark for it:

    type testStruct struct {
        X int
        Y string
    }
    
    func (t *testStruct) ToJSON() ([]byte, error) {
        return json.Marshal(t)
    }
    
    func BenchmarkToJSON(b *testing.B) {
        tmp := &testStruct{X: 1, Y: "string"}
        js, err := tmp.ToJSON()
        if err != nil {
            b.Fatal(err)
        }
        b.SetBytes(int64(len(js)))
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            if _, err := tmp.ToJSON(); err != nil {
                b.Fatal(err)
            }
        }
    }
    

    It’s commited in git already, now I want to try cool trick and measure its performance. I slightly modify ToJSON method:

    func (t *testStruct) ToJSON() ([]byte, error) {
        return []byte(`{"X": ` + strconv.Itoa(t.X) + `, "Y": "` + t.Y + `"}`), nil
    }
    

    Now it’s time to run our bechmarks, let’s save their results in files this time:

    go test -bench=. -benchmem bench_test.go > new.txt
    git stash
    go test -bench=. -benchmem bench_test.go > old.txt
    

    Now we can compare those results with benchcmp utility. You can install it with go get golang.org/x/tools/cmd/benchcmp. Here is result of comparison:

    # benchcmp old.txt new.txt
    benchmark           old ns/op     new ns/op     delta
    BenchmarkToJSON     1579          495           -68.65%
    
    benchmark           old MB/s     new MB/s     speedup
    BenchmarkToJSON     12.66        46.41        3.67x
    
    benchmark           old allocs     new allocs     delta
    BenchmarkToJSON     2              2              +0.00%
    
    benchmark           old bytes     new bytes     delta
    BenchmarkToJSON     184           48            -73.91%
    

    It’s very good to see such tables, they also can add weight to your opensource contributions.

    Writing profiles

    Also you can write cpu and memory profiles from benchmarks:

    go test -bench=. -benchmem -cpuprofile=cpu.out -memprofile=mem.out bench_test.go
    

    You can read how to analyze profiles in awesome blog post on blog.golang.org here.

    Conclusion

    Benchmarks is awesome instrument for programmer. And in Go you to writing and analyzing becnhmarks is extremely easy. New benchmarks allows you to find performance bottlenecks, weird code (efficient code is often simpler and more readable) or usage of wrong instruments. Old benchmarks allow you to be more confident in your changes and could be another +1 in review process. So, writing writing benchmarks has enormous benefits for programmer and code and I encourage you to write more. It’s fun!

    Sep 24, 2015

    Mystery of finalizers in Go

    Finalizers

    Finalizer is basically a function which will be called when your object will lost all references and will be found by GC. In Go you can add finalizers to your objects with runtime.SetFinalizer function. Let’s see how it works.

    package main
    
    import (
        "fmt"
        "runtime"
        "time"
    )
    
    type Test struct {
        A int
    }
    
    func test() {
        // create pointer
        a := &Test{}
        // add finalizer which just prints
        runtime.SetFinalizer(a, func(a *Test) { fmt.Println("I AM DEAD") })
    }
    
    func main() {
        test()
        // run garbage collection
        runtime.GC()
        // sleep to switch to finalizer goroutine
        time.Sleep(1 * time.Millisecond)
    }
    
    Output obviously will be:

    I AM DEAD
    

    So, we created object a which is pointer and set simple finalizer to it. When code left test function - all references to it disappeared and therefore garbage collector was able to collect a and call finalizer in its own goroutine. You can try to modify test() function to return *Test an print it in main(), then you’ll see that finalizer won’t be called. Also if you remove A field from Test type, because then Test became just empty struct and empty struct allocates no memory and can’t be collected by GC.

    Finalizers examples

    Let’s try to find finalizers usage in standard library. There it is used only for for closing file descriptors like this in net package:

    runtime.SetFinalizer(fd, (*netFD).Close)
    
    So, you’ll never leak fd even if you forget to Close net.Conn.

    So probably finalizers not so good idea if even in standard library it has so limited usage. Let’s see what problems can be.

    Why you should avoid finalizers

    Finalizers is pretty tempting idea if you come from languages without GC or where you’re not expecting users to write proper code. In Go we have both GC and pro-users :) So, in my opinion explicit call of Close is always better than magic finalizer. For example there is finalizer for fd in os:

    // NewFile returns a new File with the given file descriptor and name.
    func NewFile(fd uintptr, name string) *File {
        fdi := int(fd)
        if fdi < 0 {
            return nil
        }
        f := &File{&file{fd: fdi, name: name}}
        runtime.SetFinalizer(f.file, (*file).close)
        return f
    }
    
    and NewFile is called by OpenFile which is called by Open, so if you’re opening file you’ll hit that code. Problem with finalizers that you have no control over them, and more than that you’re not expecting them. Look at code:
    func getFd(path string) (int, error) {
        f, err := os.Open(path)
        if err != nil {
            return -1, err
        }
        return f.Fd(), nil
    }
    
    It’s pretty common operation to get file descriptor from path when you’re writing some stuff for linux. But that code is unreliable, because when you’re return from getFd f loses its last reference and so your file is doomed to be closed sooner or later (when next GC cycle will come). Here is problem not that file will be closed, but that it’s not documented and not expected at all.

    Conclusion

    I think it’s better to suppose that users are smart enough to cleanup object themselves. At least all methods which call SetFinalizer should document this, but I personally don’t see any value in this method for me.

    Aug 26, 2015

    Unprivileged containers in Go, Part4: Network namespace

    Network namespace

    From man namespaces:

    Network  namespaces  provide  isolation of the system resources associated with
    networking: network devices, IPv4 and IPv6 protocol stacks, IP routing tables,
    firewalls, the /proc/net directory, the /sys/class/net directory, port numbers
    (sockets), and so on.  A physical network device can live in exactly one
    network namespace.
    A  virtual  network  device ("veth") pair provides a pipe-like abstraction that
    can be used to create tunnels between network namespaces, and can be used to
    create a bridge to a physical network device in another namespace.
    

    Network namespace allows you to isolate a network stack for your container. Note that it’s not include hostname - it’s tasks of UTS namespace.

    We can create network namespace with flag syscall.CLONE_NEWNET in SysProcAttr.Cloneflags. After namespace creation there are only autogenerated network namespaces(in most cases only loopback interface). So we need to inject some network interface into namespace, which allow container to talk to other containers. We will use veth-pairs for this as it was mentioned in man-page. It’s not only way and probably not best, but it is most known and used in Docker by default.

    Unet

    For interfaces creation we will need new binary with suid bit set, because it’s pretty privileged operations. We can create them with awesome iproute2 set of utilities, but I decided to write all in Go, because it’s fun and I want to promote awesome netlink library - with this library you can do any operations on networking stuff.

    I called new binary unet, you can find it in unc repo: https://github.com/LK4D4/unc/tree/master/unet

    Bridge

    First of all we need to create a bridge. Here is sample code from unet, which sets up bridge:

    const (
        bridgeName = "unc0"
        ipAddr     = "10.100.42.1/24"
    )
    
    func createBridge() error {
        // try to get bridge by name, if it already exists then just exit
        _, err := net.InterfaceByName(bridgeName)
        if err == nil {
            return nil
        }
        if !strings.Contains(err.Error(), "no such network interface") {
            return err
        }
        // create *netlink.Bridge object
        la := netlink.NewLinkAttrs()
        la.Name = bridgeName
        br := &netlink.Bridge{la}
        if err := netlink.LinkAdd(br); err != nil {
            return fmt.Errorf("bridge creation: %v", err)
        }
        // set up ip addres for bridge
        addr, err := netlink.ParseAddr(ipAddr)
        if err != nil {
            return fmt.Errorf("parse address %s: %v", ipAddr, err)
        }
        if err := netlink.AddrAdd(br, addr); err != nil {
            return fmt.Errorf("add address %v to bridge: %v", addr, err)
        }
        // sets up bridge ( ip link set dev unc0 up )
        if err := netlink.LinkSetUp(br); err != nil {
            return err
        }
        return nil
    }
    

    I hardcoded bridge name and IP address for simplicity. Then we need to create veth-pair and attach one side of it to bridge and put another side to our namespace. Namespace we will identify by PID:

    const vethPrefix = "uv"
    
    func createVethPair(pid int) error {
        // get bridge to set as master for one side of veth-pair
        br, err := netlink.LinkByName(bridgeName)
        if err != nil {
            return err
        }
        // generate names for interfaces
        x1, x2 := rand.Intn(10000), rand.Intn(10000)
        parentName := fmt.Sprintf("%s%d", vethPrefix, x1)
        peerName := fmt.Sprintf("%s%d", vethPrefix, x2)
        // create *netlink.Veth
        la := netlink.NewLinkAttrs()
        la.Name = parentName
        la.MasterIndex = br.Attrs().Index
        vp := &netlink.Veth{LinkAttrs: la, PeerName: peerName}
        if err := netlink.LinkAdd(vp); err != nil {
            return fmt.Errorf("veth pair creation %s <-> %s: %v", parentName, peerName, err)
        }
        // get peer by name to put it to namespace
        peer, err := netlink.LinkByName(peerName)
        if err != nil {
            return fmt.Errorf("get peer interface: %v", err)
        }
        // put peer side to network namespace of specified PID
        if err := netlink.LinkSetNsPid(peer, pid); err != nil {
            return fmt.Errorf("move peer to ns of %d: %v", pid, err)
        }
        if err := netlink.LinkSetUp(vp); err != nil {
            return err
        }
        return nil
    }
    
    After all this we will have “pipe” from container to bridge unc0. But all not so easy, don’t forget that we talking about unprivileged containers, so we need to run all code from unprivileged user, but that particular part must be executed with root rights. We can set suid bit for this, this will allow unprivileged user to run that binary as privileged. I did next:

    $ go get github.com/LK4D4/unc/unet
    $ su
    $ chown root:root $GOPATH/bin/unet
    $ chmod u+s $GOPATH/bin/unet
    $ ln -s $GOPATH/bin/unet /usr/bin/unet
    

    That’s all you need to run this binary. Actually you don’t need to run it, unc will do this :)

    Waiting for interface

    Now we can create interfaces in namespaces of specified PID. But process expects that network already ready when it starts, so we need somehow to wait until interface will be created by unet in fork part of program, before calling syscall.Exec. I decided to use pretty simple idea for this: just poll an interface list until first veth device is appear. Let’s modify our container.Start to put interface in namespace after we start fork-process:

    -       return cmd.Run()
    +       if err := cmd.Start(); err != nil {
    +               return err
    +       }
    +       logrus.Debugf("container PID: %d", cmd.Process.Pid)
    +       if err := putIface(cmd.Process.Pid); err != nil {
    +               return err
    +       }
    +       return cmd.Wait()
    
    

    Function putIface just calls unet with PID as argument:

    const suidNet = "unet"
    
    func putIface(pid int) error {
        cmd := exec.Command(suidNet, strconv.Itoa(pid))
        out, err := cmd.CombinedOutput()
        if err != nil {
            return fmt.Errorf("unet: out: %s, err: %v", out, err)
        }
        return nil
    }
    
    Now let’s see code for waiting interface inside fork-process:
    func waitForIface() (netlink.Link, error) {
        logrus.Debugf("Starting to wait for network interface")
        start := time.Now()
        for {
            fmt.Printf(".")
            if time.Since(start) > 5*time.Second {
                fmt.Printf("\n")
                return nil, fmt.Errorf("failed to find veth interface in 5 seconds")
            }
            // get list of all interfaces
            lst, err := netlink.LinkList()
            if err != nil {
                fmt.Printf("\n")
                return nil, err
            }
            for _, l := range lst {
                // if we found "veth" interface - it's time to continue setup
                if l.Type() == "veth" {
                    fmt.Printf("\n")
                    return l, nil
                }
            }
            time.Sleep(100 * time.Millisecond)
        }
    }
    
    We need to put this function before execProc in fork. So, now we have veth interface and we can continue with its setup and process execution.

    Network setup

    Now easiest part: we just need to set IP to our new interface and set it up. I added IP field to Cfg struct:

    type Cfg struct {
            Hostname string
            Mounts   []Mount
            Rootfs   string
    +       IP       string
     }
    

    and filled it with pseudorandom IP from bridge subnet(10.100.42.0/24):

    const ipTmpl = "10.100.42.%d/24"
    defaultCfg.IP = fmt.Sprintf(ipTmpl, rand.Intn(253)+2)
    

    Code for network setup:

    func setupIface(link netlink.Link, cfg Cfg) error {
        // up loopback
        lo, err := netlink.LinkByName("lo")
        if err != nil {
            return fmt.Errorf("lo interface: %v", err)
        }
        if err := netlink.LinkSetUp(lo); err != nil {
            return fmt.Errorf("up veth: %v", err)
        }
        addr, err := netlink.ParseAddr(cfg.IP)
        if err != nil {
            return fmt.Errorf("parse IP: %v", err)
        }
        return netlink.AddrAdd(link, addr)
    }
    
    That’s all, now we can exec our process.

    Talking containers

    Let’s try to connect our containers. I presume here that we’re in directory with busybox rootfs:

    $ unc sh
    $ ip a
    1: lo: <LOOPBACK> mtu 65536 qdisc noop
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: sit0@NONE: <NOARP> mtu 1480 qdisc noop
        link/sit 0.0.0.0 brd 0.0.0.0
    475: uv5185@if476: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc pfifo_fast qlen 1000
        link/ether e2:2b:71:19:73:73 brd ff:ff:ff:ff:ff:ff
        inet 10.100.42.24/24 scope global uv5185
           valid_lft forever preferred_lft forever
        inet6 fe80::e02b:71ff:fe19:7373/64 scope link
           valid_lft forever preferred_lft forever
    
    $ unc ping -c 1 10.100.42.24
    PING 10.100.42.24 (10.100.42.24): 56 data bytes
    64 bytes from 10.100.42.24: seq=0 ttl=64 time=0.071 ms
    
    --- 10.100.42.24 ping statistics ---
    1 packets transmitted, 1 packets received, 0% packet loss
    round-trip min/avg/max = 0.071/0.071/0.071 ms
    

    They can talk! It’s like magic, right? You can find all code under tag netns.

    The end

    This is last post about unprivileged containers(at least about namespaces). We created an isolated environment for process, which you can run under unprivileged user. Containers though is little more than just isolation - also you want to specify what process can do inside container (Linux capabilities), how much resources process can use (Cgroups) and you can imagine many other things. I invite you to look what we have in runc/libcontainer, it’s not very easy code, but I hope that after my posts you will be able to understand it. If you have any questions feel free to write me, I’m always happy to share my humble knowledge about containers.

    Previous parts:

    Unprivileged containers in Go, Part3: Mount namespace

    Mount namespace

    From man namespaces:

    Mount namespaces isolate the set of filesystem mount points, meaning that
    processes in different mount namespaces can have different views of the
    filesystem hierarchy. The set of mounts in a mount namespace is modified using
    mount(2) and umount(2).
    

    So, mount namespace allows you to give your process different set of mounts. You can have separate /proc, /dev etc. It’s easy just like pass one more flag to SysProcAttr.Cloneflags: syscall.CLONE_NEWNS. It has such weird name because it was first introduced namespace and nobody could think that there will be more. So, if you see CLONE_NEWNS, know - this is mount namespace. Let’s try to enter our container with new mount namespace. We’ll see all the same mounts as in host. That’s because new mount namespace receives copy of parent host namespace as initial mount table. In our case we’re pretty restricted in what we can do with this mounts, for example we can’t unmount anything:

    $ umount /proc
    umount: /proc: not mounted
    

    That’s because we use “unprivileged” namespace. But we can mount new /proc over old:

    mount -t proc none /proc
    

    Now you can see, that ps shows you only your process. So, to get rid of host mounts and have nice clean mount table we can use pivot_root syscall to change root from host root to some another. But first we need to write some code to really mount something into new rootfs.

    Mounting inside root file system

    So, for next steps we will need some root filesystem for tests. I will use busybox one, because it’s very small, but useful. Busybox rootfs from Docker official image you can take here. Just unpack it to directory busybox somewhere:

    $ mkdir busybox
    $ cd busybox
    $ wget https://github.com/jpetazzo/docker-busybox/raw/master/rootfs.tar
    $ tar xvf rootfs.tar
    

    Now when we have rootfs, we need to mount some stuff inside it, let’s create datastructure for describing mounts:

    type Mount struct {
        Source string
        Target string
        Fs     string
        Flags  int
        Data   string
    }
    
    It is just arguments to syscall.Mount in form of structure. Now we can add some mounts and path to rootfs(it will be just current directory for unc) in addition to hostname to our Cfg structure:
    type Cfg struct {
        Path     string
        Args     []string
        Hostname string
        Mounts   []Mount
        Rootfs   string
    }
    
    For start I added /proc(to see process tree from new PID namespaces, btw you can’t mount /proc without PID namespace) and /dev:
        Mounts: []Mount{
            {
                Source: "proc",
                Target: "/proc",
                Fs:     "proc",
                Flags:  defaultMountFlags,
            },
            {
                Source: "tmpfs",
                Target: "/dev",
                Fs:     "tmpfs",
                Flags:  syscall.MS_NOSUID | syscall.MS_STRICTATIME,
                Data:   "mode=755",
            },
        },
    

    Mounting function looks very easy, we just iterate over mounts and call syscall.Mount:

    func mount(cfg Cfg) error {
        for _, m := range cfg.Mounts {
            target := filepath.Join(cfg.Rootfs, m.Target)
            if err := syscall.Mount(m.Source, target, m.Fs, uintptr(m.Flags), m.Data); err != nil {
                return fmt.Errorf("failed to mount %s to %s: %v", m.Source, target, err)
            }
        }
        return nil
    }
    

    Now we have something mounted inside our new rootfs. Time to pivot_root to it.

    Pivot root

    From man 2 pivot_root:

    int pivot_root(const char *new_root, const char *put_old);
    ...
    pivot_root() moves the root filesystem of the calling process to the directory
    put_old and makes new_root the new root filesystem of the calling process.
    
    ...
    
           The following restrictions apply to new_root and put_old:
    
           -  They must be directories.
    
           -  new_root and put_old must not be on the same filesystem as the current root.
    
           -  put_old must be underneath new_root, that is, adding a nonzero number
              of /.. to the string pointed to by put_old must yield the same directory as new_root.
    
           -  No other filesystem may be mounted on put_old.
    

    So, it’s taking current root, moves it to old_root with all mounts and makes new_root as new root. pivot_root is more secure than chroot, it’s pretty hard to escape from it. Sometimes pivot_root isn’t working(for example on Android systems, because of special kernel loading process), then you need to use mount to “/” with MS_MOVE flag and chroot there, here we won’t discuss this case.

    Here is the function which we will use for changing root:

    func pivotRoot(root string) error {
        // we need this to satisfy restriction:
        // "new_root and put_old must not be on the same filesystem as the current root"
        if err := syscall.Mount(root, root, "bind", syscall.MS_BIND|syscall.MS_REC, ""); err != nil {
            return fmt.Errorf("Mount rootfs to itself error: %v", err)
        }
        // create rootfs/.pivot_root as path for old_root
        pivotDir := filepath.Join(root, ".pivot_root")
        if err := os.Mkdir(pivotDir, 0777); err != nil {
            return err
        }
        // pivot_root to rootfs, now old_root is mounted in rootfs/.pivot_root
        // mounts from it still can be seen in `mount`
        if err := syscall.PivotRoot(root, pivotDir); err != nil {
            return fmt.Errorf("pivot_root %v", err)
        }
        // change working directory to /
        // it is recommendation from man-page
        if err := syscall.Chdir("/"); err != nil {
            return fmt.Errorf("chdir / %v", err)
        }
        // path to pivot root now changed, update
        pivotDir = filepath.Join("/", ".pivot_root")
        // umount rootfs/.pivot_root(which is now /.pivot_root) with all submounts
        // now we have only mounts that we mounted ourselves in `mount`
        if err := syscall.Unmount(pivotDir, syscall.MNT_DETACH); err != nil {
            return fmt.Errorf("unmount pivot_root dir %v", err)
        }
        // remove temporary directory
        return os.Remove(pivotDir)
    }
    
    I hope that all is clear from comments, let me know if not. It is all code that you need to have your own unprivileged container with its own rootfs. You can try to find other rootfs among docker images sources, for example alpine linux is pretty exciting distribution. Also you can try to mount something more inside container.

    That’s all for today. Tag for this article on github is mnt_ns. Remember that you should run unc from unprivileged user and from directory, which contains rootfs. Here is examples of some commands inside container(excluding logging):

    $ unc cat /etc/os-release
    NAME=Buildroot
    VERSION=2015.02
    ID=buildroot
    VERSION_ID=2015.02
    PRETTY_NAME="Buildroot 2015.02"
    
    $ unc mount
    /dev/sda3 on / type ext4 (rw,noatime,nodiratime,nobarrier,errors=remount-ro,commit=600)
    proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
    tmpfs on /dev type tmpfs (rw,nosuid,nodev,mode=755,uid=1000,gid=1000)
    
    $ unc ps awxu
    PID   USER     COMMAND
        1 root     ps awxu
    

    Looks pretty “container-ish” I think :)

    Previous parts:

    Unprivileged containers in Go, Part2: UTS namespace (setup namespaces)

    Setup namespaces

    In previous part we created some namespaces and executed process in them. It was cool, but in real world we need to setup namespaces before process starts. For example setup mounts, make chroot, set hostname, create network interfaces etc. We need this because we can’t expect from user process that it will do it, it want all ready to execute.

    So, in our case we want to insert some code after namespaces creation, but before process execution. In C it’s pretty easy to do, because there is clone call there. Not so easy in Go(but easy, really). In Go we need to spawn new process with our code in new namespaces. We can do it with executing our own binary again with different arguments.

    Look at code:

        cmd := &exec.Cmd{
            Path: os.Args[0],
            Args: append([]string{"unc-fork"}, os.Args[1:]...),
        }
    

    Here we create *exec.Cmd which will call same binary with same arguments as caller, but will replace os.Args[0] with string unc-fork (yes, you can specify any os.Args[0], not only program name). It will be our keyword, which indicates that we want to setup namespaces and execute process.

    So, let’s insert at the top of main() function next lines:

        if os.Args[0] == "unc-fork" {
            if err := fork(); err != nil {
                log.Fatalf("Fork error: %v", err)
            }
            os.Exit(0)
        }
    

    It means next: execute function fork() and exit in special case of os.Args[0].

    Let’s write fork() now:

    func fork() error {
        fmt.Println("Start fork")
        path, err := exec.LookPath(os.Args[1])
        if err != nil {
            return fmt.Errorf("LookPath: %v", err)
        }
        fmt.Printf("Execute %s\n", append([]string{path}, os.Args[2:]...))
        return syscall.Exec(path, os.Args[1:], os.Environ())
    }
    

    It’s simplest fork() function, it’s just prints some messages before starting process. Let’s look at os.Args array here. For example if we wanted to spawn sh -c "echo hello" in namespaces, then now array looks like ["fork", "sh", "-c", "echo hello"]. We resolving "sh" as "/bin/sh" and call

    syscall.Exec("/bin/sh", []string{"sh", "-c", "echo hello"}, os.Environ())
    

    syscall.Exec calls execve syscall, you can read about it more in man execve. It receives path to binary, arguments and array of environmental variables. Here we just passing all variables down to process, but we can change them in fork() too.

    UTS namespace

    Let’s do some real work in our new shiny function. Let’s try to setup hostname for our “container” (by default it inherits hostname of host). Let’s add next lines to fork():

        if err := syscall.Sethostname([]byte("unc")); err != nil {
            return fmt.Errorf("Sethostname: %v", err)
        }
    

    If we try to execute this code we’ll get:

    Fork error: Sethostname: operation not permitted
    

    because we’re trying to change hostname in host’s UTS namespace.

    From man namespaces:

    UTS  namespaces  provide  isolation  of two system identifiers: the hostname and the NIS domain name.
    

    So let’s isolate our hostname from host’s hostname. We can create our own UTS namespace by adding syscall.CLONE_NEWUTS to Cloneflags. Now we’ll see successfully changed hostname:

    $ unc hostname
    unc
    

    Code

    Tag on github for this article is uts_setup, it can be found here. I added some functions to separate steps, created Cfg structure in container.go file, so later we can change container configuration in one place. Also I added logging with awesome library logrus.

    Thanks for reading! I hope to see you next week in part about mount namespaces, it’ll be very interesting.

    Previous parts:

    Unprivileged containers in Go, Part1: User and PID namespaces

    Unprivileged namespaces

    Unprivileged(or user) namespaces are Linux namespaces, which can be created by an unprivileged(non-root) user. It is possible only with a usage of user namespaces. Exhaustive info about user namespaces you can find in manpage man user_namespaces. Basically for creating your namespaces you need to create user namespace first. The kernel can take a job of creating namespaces in the right order for you, so you can just pass a bunch of flags to clone and user namespace always created first and is a parent for other namespaces.

    User namespace

    In user namespace you can map user and groups from host to this namespace, so for example, your user with uid 1000 can be 0(root) in a namespace.

    Mrunal Patel introduced to Go support for user and groups and go 1.4.0 including it. Unfortunately, there was security fix to linux kernel 3.18, which prevents group mappings from the unprivileged user without disabling setgroups syscall. It was fixed by me and will be released in 1.5.0 (UPD: Already released!).

    For executing process in new user namespace, you need to create *exec.Cmd like this:

    cmd := exec.Command(binary, args...)
    cmd.SysProcAttr = &syscall.SysProcAttr{
            Cloneflags: syscall.CLONE_NEWUSER
            UidMappings: []syscall.SysProcIDMap{
                {
                    ContainerID: 0,
                    HostID:      Uid,
                    Size:        1,
                },
            },
            GidMappings: []syscall.SysProcIDMap{
                {
                    ContainerID: 0,
                    HostID:      Gid,
                    Size:        1,
                },
            },
        }
    

    Here you can see syscall.CLONE_NEWUSER flag in SysProcAttr.Cloneflags, which means just “please, create new user namespace for this process”, another namespace can be specified there too. Mappings fields talk for themselves. Size means a size of a range of mapped IDs, so you can remap many IDs without specifying each.

    PID namespaces

    From man pid_namespaces:

    PID namespaces isolate the process ID number space
    

    That is it, your process in this namespace has PID 1, which is sorta cool: You are like systemd, but better. In our first part ps awux won’t show only our process, because we need mount namespace and remount /proc, but still you can see PID 1 with echo $$.

    First unprivileged container

    I am pretty bad at writing big texts, so I decided to split container creation to several parts. Today we will see only user and PID namespace creation, which still pretty impressive. So, for adding PID namespace we need to modify Cloneflags:

        Cloneflags: syscall.CLONE_NEWUSER | syscall.CLONE_NEWPID
    

    For this articles, I created a project on Github: https://github.com/LK4D4/unc. unc means “unprivileged container” and has nothing in common with runc(maybe only a little). I will tag code for each article in a repo. Tag for this article is user_pid. Just compile it with go1.5 and try to run different commands from an unprivileged user in namespaces:

    $ unc sh
    $ unc whoami
    $ unc sh -c "echo \$\$"
    

    It is doing nothing fancy, but just connects your standard streams to executed process and execute it in new namespaces with a remapping current user and group to root user and the group inside user namespace. Please read all code, there is not much for now.

    Next steps

    Most interesting part of containers is mount namespace. It allows you to have mounts separate from host(/proc for example). Another interesting namespace is a network, it is little tough for an unprivileged user, because you need to create network interfaces on host first, so for this you need some superpowers from the root. In next article, I hope to cover mount namespace - so it a real container with own root filesystem.

    Thanks for reading! I am learning all this stuff myself right now by writing this articles, so if you have something to say, please feel free to comment!

    Developing Arduino with Docker

    I’m using Gentoo and using Arduino on Gentoo isn’t very easy: Arduino on Gentoo Linux.

    It is easy with Docker though. Let’s see how we can upload our first program to Arduino Uno without installing anything apart from Docker.

    Kernel Modules

    For Arduino Uno I need to enable

    Device Drivers -> USB support -> USB Modem (CDC ACM) support
    

    as module.

    Then I compiling and loading it with

    make modules && make modules_install && modprobe cdc-acm
    

    in my /usr/src/linux. At last I connect Arduino and see it as /dev/ttyACM0.

    Installing ino

    For this we just need image from hub.docker.com:

    docker pull coopermaa/ino
    

    It’s slightly outdated, but I sent PR to use new base image, because that’s how we do this in opensource world. Anyway it works great. Let’s create script for calling ino through Docker, add next script to your $PATH

    #!/bin/sh
    docker run --rm --privileged --device=/dev/ttyACM0 -v $(pwd):/app coopermaa/ino $@
    

    and call it ino. Don’t forget

    chmod +x ino
    

    Alternatively you can use alias in .bashrc:

    alias ino='docker run --privileged \
      --rm \
      --device=/dev/ttyACM0 \
      -v $(pwd):/app \
      coopermaa/ino'
    

    but script worked better with my vim.

    Uploading program

    Let’s create program from template and upload it to board:

    $ mkdir blink && cd blink
    $ ino init -t blink
    $ ino build && ino upload
    

    Whoa! It’s alive!

    Vim integration

    I’m using Vim plugin for ino, you can easily install it with any plugin manager for vim. You don’t need anything special, it’ll just work. You can compile and upload your sketch with <Leader>ad.

    Known issues

    For using ino serial you need to add -t to docker run arguments to your script. It works pretty weird though, you need to kill process /usr/bin/python /usr/local/bin/ino serial by hands every time, but it works and looks not so bad.

    Also files created by ino init will belong to root, which isn’t very convenient.

    That’s all!

    Thank you for reading and special thanks to coopermaa for ino image.

    30 days of hacking Docker

    Prelude

    Yesterday I finished my first 30-day streak on GitHub. Most of contributions were to Docker – the biggest opensource project on Go. I learned a lot in this month, and it was really cool. I think that this is mostly because of Go language. I’ve been programming on Python for five years and I was never so excited about open source, because Python is not even half so fun as Go.

    1. Tools

    There are a lot of tools for go, some of them just are “must have”.

    Goimports - like go fmt but with cool imports handling, I really think that go fmt needs to be replaced with Goimports in future Go versions.

    Vet - analyzes code for some suspicious constructs. You can find with it: bad format strings, unreachable code, passing mutex by value and etc. PR about vet erros in Docker.

    Golint - checks code for google style guide.

    2. Editor

    I love my awesome vim with awesome vim-go plugin, which is integrated with tools mentioned above. It formats code for me, adds needed imports, removes unused imports, shows documentation, supports tagbar and more. And my favourite - go to definition. I really suffered without it :) With vim-go my development rate became faster than I could imagine. You can see my config in my dotfiles repo.

    3. Race detector

    This is one of the most important and one of the most underestimated thing. Very useful and very easy to use. You can find description and examples here. I’ve found many race conditions with this tool (#1, #2, #3, #4, #5).

    4. Docker specific

    Docker has very smart and friendly community. You can always ask for help about hacking in #docker-dev on Freenode. But I’ll describe some simple tasks that appears when you try to hack docker first time.

    Tests

    There are three kinds of tests in docker repo:

    • unit - unit tests(ah, we all know what unit tests are, right?). These tests spreaded all over repository and can be run by make test-unit. You can run tests for one directory, specifying it in TESTDIRS variable. For example

      TESTDIRS="daemon" make test-unit
      

      will run tests only for daemon directory.

    • integration-cli - integration tests, that use external docker commands (for example docker build, docker run, etc.). It is very easy to write this kind of tests and you should do it if you think that your changes can change Docker’s behavior from client’s point of view. These tests are located in integration-cli directory and can be run by make test-integration-cli. You can run one or more specific tests with setting TESTFLAGS variable. For example

      TESTFLAGS="-run TestBuild" make test-integration-cli
      

      will run all tests whose names starts with TestBuild.

    • integration - integration tests, that use internal docker datastructures. It is deprecated now, so if you want to write tests you should prefer integration-cli or unit. These tests are located in integration directory and can be run by make test-integration.

    All tests can be run by make test.

    Build and run tests on host

    All make commands execute in docker container, it can be pretty annoying to build container just for running unit tests for example.

    So, for running unit test on host machine you need canonical Go workspace. When it’s ready you can just do symlink to docker repo in src/github.com/dotcloud/docker. But we still need right $GOPATH, here is the trick:

    export GOPATH=<workspace>/src/github.com/dotcloud/docker/vendor:<workspace>
    

    And then, for example you can run:

    go test github.com/dotcloud/docker/daemon/networkdriver/ipallocator
    

    Some tests require external libs for example libdevmapper, you can disable it with DOCKER_BUILDTAGS environment variable. For example:

    export DOCKER_BUILDTAGS='exclude_graphdriver_devicemapper exclude_graphdriver_aufs'
    

    For fast building dynamic binary you can use this snippet in docker repo:

    export AUTO_GOPATH=1
    export DOCKER_BUILDTAGS='exclude_graphdriver_devicemapper exclude_graphdriver_aufs'
    hack/make.sh dynbinary
    

    I use that DOCKER_BUILDTAGS for my btrfs system, so if you use aufs or devicemapper you should change it for your driver.

    Race detection

    To enable race detection in docker I’m using patch:

    diff --git a/hack/make/binary b/hack/make/binary
    index b97069a..74b202d 100755
    --- a/hack/make/binary
    +++ b/hack/make/binary
    @@ -6,6 +6,7 @@ DEST=$1
     go build \
            -o "$DEST/docker-$VERSION" \
            "${BUILDFLAGS[@]}" \
    +       -race \
            -ldflags "
                    $LDFLAGS
                    $LDFLAGS_STATIC_DOCKER
    

    After that all binaries will be with race detection. Note that this will slow docker a lot.

    Docker-stress

    There is amazing docker-stress from Spotify for Docker load testing. Usage is pretty straightforward:

    ./docker-stress -c 50 -t 5
    

    Here 50 clients are trying to run containers, which will alive for five seconds. docker-stress uses only docker run jobs for testing, so I prefer also to run in parallel sort of:

    docker events
    while true; do docker inspect $(docker ps -lq); done
    while true; do docker build -t test test; done
    

    and so on.

    You definitely need to read Contributing to Docker and Setting Up a Dev Environment. I really don’t think that something else is needed for Docker hacking start.

    Conclusion

    This is all that I wanted to tell you about my first big opensource experience. Also, just today Docker folks launched some new projects and I am very excited about it. So, I want to invite you all to the magical world of Go, Opensource and, of course, Docker.

    Defer overhead in go

    Prelude

    This post based on real events in docker repository. When I revealed that my 20-percent-cooler refactoring made Pop function x4-x5 times slower, I did some research and concluded, that problem was in using defer statement for unlocking everywhere.

    In this post I’ll write simple program and benchmarks from which we will see, that sometimes defer statement can slowdown your program a lot.

    Let’s create simple queue with methods Put and Get. Next snippets shows such queue and benchmarks for it. Also I wrote duplicate methods with defer and without it.

    Code

    package defertest
    
    import (
        "sync"
    )
    
    type Queue struct {
        sync.Mutex
        arr []int
    }
    
    func New() *Queue {
        return &Queue{}
    }
    
    func (q *Queue) Put(elem int) {
        q.Lock()
        q.arr = append(q.arr, elem)
        q.Unlock()
    }
    
    func (q *Queue) PutDefer(elem int) {
        q.Lock()
        defer q.Unlock()
        q.arr = append(q.arr, elem)
    }
    
    func (q *Queue) Get() int {
        q.Lock()
        if len(q.arr) == 0 {
            q.Unlock()
            return 0
        }
        res := q.arr[0]
        q.arr = q.arr[1:]
        q.Unlock()
        return res
    }
    
    func (q *Queue) GetDefer() int {
        q.Lock()
        defer q.Unlock()
        if len(q.arr) == 0 {
            return 0
        }
        res := q.arr[0]
        q.arr = q.arr[1:]
        return res
    }
    

    Benchmarks

    package defertest
    
    import (
        "testing"
    )
    
    func BenchmarkPut(b *testing.B) {
        q := New()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            for j := 0; j < 1000; j++ {
                q.Put(j)
            }
        }
    }
    
    func BenchmarkPutDefer(b *testing.B) {
        q := New()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            for j := 0; j < 1000; j++ {
                q.PutDefer(j)
            }
        }
    }
    
    func BenchmarkGet(b *testing.B) {
        q := New()
        for i := 0; i < 1000; i++ {
            q.Put(i)
        }
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            for j := 0; j < 2000; j++ {
                q.Get()
            }
        }
    }
    
    func BenchmarkGetDefer(b *testing.B) {
        q := New()
        for i := 0; i < 1000; i++ {
            q.Put(i)
        }
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            for j := 0; j < 2000; j++ {
                q.GetDefer()
            }
        }
    }
    

    Results

    BenchmarkPut       50000             63002 ns/op
    BenchmarkPutDefer  10000            143391 ns/op
    BenchmarkGet       50000             72045 ns/op
    BenchmarkGetDefer  10000            249029 ns/op
    

    Conclusion

    You don’t need defers in small functions with one-two exit points.

    Update

    Retested with go from tip as Cezar Sá Espinola suggested. So, here results:

    BenchmarkPut       50000             54633 ns/op
    BenchmarkPutDefer          10000            102971 ns/op
    BenchmarkGet       50000             65148 ns/op
    BenchmarkGetDefer          10000            180839 ns/op
    
    May 14, 2014

    Coverage for multiple packages in go

    Prelude

    There is awesome coverage in go. You can read about it here. But also it has some limitations. For example let’s assume that we have next code structure:

    src
    ├── pkg1
    │   ├── pkg11
    │   └── pkg12
    └── pkg2
        ├── pkg21
        └── pkg22
    

    pkg2, pkg21, pkg22 uses pkg1, pkg11 and pkg12 in different cases. So question is – how we can compute overall coverage for our code base?

    Generating cover profiles

    Let’s consider some possible go test commands with -coveprofile:

    go test -coverprofile=cover.out pkg2
    

    tests run only for pkg1 and cover profile generated only for pkg2

    go test -coverprofile=cover.out -coverpkg=./... pkg2
    

    tests run only for pkg2 and cover profile generated for all packages

    go test -coverprofile=cover.out -coverpkg=./... ./...
    

    boo hoo: cannot use test profile flag with multiple packages

    So, what we can do for running tests on all packages and get cover profile for all packages?

    Merging cover profiles

    Now we able to get overall profile for each package individually. It seems that we can merge this files. Profile file has next structure, according to cover code:

    // First line is "mode: foo", where foo is "set", "count", or "atomic".
    // Rest of file is in the format
    //      encoding/base64/base64.go:34.44,37.40 3 1
    // where the fields are: name.go:line.column,line.column numberOfStatements count
    

    So, using magic of awk I found next solution to this problem:

    go test -coverprofile=pkg1.cover.out -coverpkg=./... pkg1
    go test -coverprofile=pkg11.cover.out -coverpkg=./... pkg1/pkg11
    go test -coverprofile=pkg12.cover.out -coverpkg=./... pkg1/pkg12
    go test -coverprofile=pkg2.cover.out -coverpkg=./... pkg2
    go test -coverprofile=pkg21.cover.out -coverpkg=./... pkg2/pkg21
    go test -coverprofile=pkg22.cover.out -coverpkg=./... pkg2/pkg22
    echo "mode: set" > coverage.out && cat *.cover.out | grep -v mode: | sort -r | \
    awk '{if($1 != last) {print $0;last=$1}}' >> coverage.out
    

    The true meaning of last line I leave as an exercise for user :) Now we have profile for all code, that was executed by all tests. We can use this merged profile coverage.out for go cover tool:

    go tool cover -html=coverage.out
    

    or for generating cobertura report:

    gocover-cobertura < coverage.txt > coverage.xml
    

    Conclusion

    Of course this solution is only workaround. And it works only for mode: set. Similar logic must be embedded to cover tool. I am really hope that one day we will be able to run

    go test -coverprofile=cover.out -coverpkg=./... ./...
    

    and leaning back in chair, enjoying perfect cover profiles.

    May 6, 2014

    Deploying blog with docker and hugo

    Prelude

    Recently I moved my jabber-server to DigitalOcean VPS. Run Prosody in docker was so easy, that I decided create this blog. And of course deploy it with docker!

    Content

    At first we need container with templates and content for blog generation. I used next dockerfile:

    FROM debian:jessie
    
    RUN apt-get update && apt-get install --no-install-recommends -y ca-certificates git-core
    RUN git clone http://github.com/LK4D4/lk4d4.darth.io.git /src
    VOLUME ["/src"]
    WORKDIR /src
    ENTRYPOINT ["git"]
    CMD ["pull"]
    

    There is no magic here: just clone repo to /src (it will be used below), and update it on container start.

    Build image:

    docker build -t blog/content .
    

    Create data container:

    docker run --name blog_content blog/content
    

    For updating content and templates from github we need just:

    docker start blog_content
    

    Hugo

    Hugo – very fast static site generator, written in Go (so many cool things written in Go btw).

    Idea is to run hugo in docker container so it reads contents from one directory and writes generated blog to another.

    Hugo dockerfile:

    FROM crosbymichael/golang
    
    RUN apt-get update && apt-get install --no-install-recommends -y bzr
    
    RUN go get github.com/spf13/hugo
    
    VOLUME ["/var/www/blog"]
    
    ENTRYPOINT ["hugo"]
    CMD ["-w", "-s", "/src", "-d", "/var/www/blog"]
    

    So, here we go get hugo and use /src(remember this from content container?) as source directory for it and /var/www/blog as destination.

    Now build image and run container with hugo:

    docker build -t blog/rendered .
    docker run --name blog --volumes-from blog_content blog/rendered
    

    Here the trick with --volumes-from – we used /src from blog_content container, and yeah, we’re going to use /var/www/blog from blog container.

    Nginx

    So, now we have container with templates and content blog_content, content with ready to use blog blog, it’s time to show this blog to the world.

    I write simple config for nginx:

    server {
        listen 80;
        server_name lk4d4.darth.io;
        location / {
            root /var/www/blog;
        }
    }
    

    Put it to sites-enabled directory, which used in this pretty dockerfile:

    FROM dockerfile/nginx
    
    ADD sites-enabled/ /etc/nginx/sites-enabled
    

    Build image and run container with nginx:

    docker build -t nginx .
    docker run -p 80:80 -d --name=nginx --volumes-from=blog nginx
    

    That’s it, now blog is running on lk4d4.darth.io and you can read it :) I can update it just with docker start blog_content.

    Conclusions

    It’s really fun to use docker. You don’t need to install and remove tons of crap on host machine, docker can handle it all for you.

    docker