Unprivileged containers in Go, Part4: Network namespace
Network namespace
From man namespaces
:
Network namespaces provide isolation of the system resources associated with
networking: network devices, IPv4 and IPv6 protocol stacks, IP routing tables,
firewalls, the /proc/net directory, the /sys/class/net directory, port numbers
(sockets), and so on. A physical network device can live in exactly one
network namespace.
A virtual network device ("veth") pair provides a pipe-like abstraction that
can be used to create tunnels between network namespaces, and can be used to
create a bridge to a physical network device in another namespace.
Network namespace allows you to isolate a network stack for your container. Note
that it’s not include hostname
- it’s tasks of UTS namespace.
We can create network namespace with flag syscall.CLONE_NEWNET
in
SysProcAttr.Cloneflags
. After namespace creation there are only autogenerated
network namespaces(in most cases only loopback interface). So we need to inject
some network interface into namespace, which allow container to talk to other
containers. We will use veth-pairs
for this as it was mentioned in man-page.
It’s not only way and probably not best, but it is most known and used in Docker
by default.
Unet
For interfaces creation we will need new binary with suid bit set, because
it’s pretty privileged operations. We can create them with awesome iproute2
set of utilities, but I decided to write all in Go, because it’s fun and I want
to promote awesome netlink library -
with this library you can do any operations on networking stuff.
I called new binary unet
, you can find it in unc
repo:
https://github.com/LK4D4/unc/tree/master/unet
Bridge
First of all we need to create a bridge. Here is sample code from unet
, which
sets up bridge:
const (
bridgeName = "unc0"
ipAddr = "10.100.42.1/24"
)
func createBridge() error {
// try to get bridge by name, if it already exists then just exit
_, err := net.InterfaceByName(bridgeName)
if err == nil {
return nil
}
if !strings.Contains(err.Error(), "no such network interface") {
return err
}
// create *netlink.Bridge object
la := netlink.NewLinkAttrs()
la.Name = bridgeName
br := &netlink.Bridge{la}
if err := netlink.LinkAdd(br); err != nil {
return fmt.Errorf("bridge creation: %v", err)
}
// set up ip addres for bridge
addr, err := netlink.ParseAddr(ipAddr)
if err != nil {
return fmt.Errorf("parse address %s: %v", ipAddr, err)
}
if err := netlink.AddrAdd(br, addr); err != nil {
return fmt.Errorf("add address %v to bridge: %v", addr, err)
}
// sets up bridge ( ip link set dev unc0 up )
if err := netlink.LinkSetUp(br); err != nil {
return err
}
return nil
}
I hardcoded bridge name and IP address for simplicity. Then we need to create
veth-pair
and attach one side of it to bridge and put another side to our
namespace. Namespace we will identify by PID:
const vethPrefix = "uv"
func createVethPair(pid int) error {
// get bridge to set as master for one side of veth-pair
br, err := netlink.LinkByName(bridgeName)
if err != nil {
return err
}
// generate names for interfaces
x1, x2 := rand.Intn(10000), rand.Intn(10000)
parentName := fmt.Sprintf("%s%d", vethPrefix, x1)
peerName := fmt.Sprintf("%s%d", vethPrefix, x2)
// create *netlink.Veth
la := netlink.NewLinkAttrs()
la.Name = parentName
la.MasterIndex = br.Attrs().Index
vp := &netlink.Veth{LinkAttrs: la, PeerName: peerName}
if err := netlink.LinkAdd(vp); err != nil {
return fmt.Errorf("veth pair creation %s <-> %s: %v", parentName, peerName, err)
}
// get peer by name to put it to namespace
peer, err := netlink.LinkByName(peerName)
if err != nil {
return fmt.Errorf("get peer interface: %v", err)
}
// put peer side to network namespace of specified PID
if err := netlink.LinkSetNsPid(peer, pid); err != nil {
return fmt.Errorf("move peer to ns of %d: %v", pid, err)
}
if err := netlink.LinkSetUp(vp); err != nil {
return err
}
return nil
}
unc0
. But all not
so easy, don’t forget that we talking about unprivileged containers, so we need
to run all code from unprivileged user, but that particular part must be
executed with root
rights. We can set suid bit for this, this will allow
unprivileged user to run that binary as privileged. I did next:
$ go get github.com/LK4D4/unc/unet
$ sudo chown root:root $(go env GOPATH)/bin/unet
$ sudo chmod u+s $(go env GOPATH)/bin/unet
$ sudo ln -s $(go env GOPATH)/bin/unet /usr/bin/unet
That’s all you need to run this binary. Actually you don’t need to run it,
unc
will do this :)
Waiting for interface
Now we can create interfaces in namespaces of specified PID. But process expects
that network already ready when it starts, so we need somehow to wait until
interface will be created by unet
in fork
part of program, before calling
syscall.Exec
. I decided to use pretty simple idea for this: just poll an
interface list until first veth
device is appear. Let’s modify our
container.Start
to put interface in namespace after we start fork-process:
- return cmd.Run()
+ if err := cmd.Start(); err != nil {
+ return err
+ }
+ logrus.Debugf("container PID: %d", cmd.Process.Pid)
+ if err := putIface(cmd.Process.Pid); err != nil {
+ return err
+ }
+ return cmd.Wait()
Function putIface
just calls unet
with PID as argument:
const suidNet = "unet"
func putIface(pid int) error {
cmd := exec.Command(suidNet, strconv.Itoa(pid))
out, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("unet: out: %s, err: %v", out, err)
}
return nil
}
func waitForIface() (netlink.Link, error) {
logrus.Debugf("Starting to wait for network interface")
start := time.Now()
for {
fmt.Printf(".")
if time.Since(start) > 5*time.Second {
fmt.Printf("\n")
return nil, fmt.Errorf("failed to find veth interface in 5 seconds")
}
// get list of all interfaces
lst, err := netlink.LinkList()
if err != nil {
fmt.Printf("\n")
return nil, err
}
for _, l := range lst {
// if we found "veth" interface - it's time to continue setup
if l.Type() == "veth" {
fmt.Printf("\n")
return l, nil
}
}
time.Sleep(100 * time.Millisecond)
}
}
execProc
in fork
. So, now we have veth
interface and we can continue with its setup and process execution.
Network setup
Now easiest part: we just need to set IP to our new interface and set it up.
I added IP
field to Cfg
struct:
type Cfg struct {
Hostname string
Mounts []Mount
Rootfs string
+ IP string
}
and filled it with pseudorandom IP from bridge subnet(10.100.42.0/24
):
const ipTmpl = "10.100.42.%d/24"
defaultCfg.IP = fmt.Sprintf(ipTmpl, rand.Intn(253)+2)
Code for network setup:
func setupIface(link netlink.Link, cfg Cfg) error {
// up loopback
lo, err := netlink.LinkByName("lo")
if err != nil {
return fmt.Errorf("lo interface: %v", err)
}
if err := netlink.LinkSetUp(lo); err != nil {
return fmt.Errorf("up veth: %v", err)
}
addr, err := netlink.ParseAddr(cfg.IP)
if err != nil {
return fmt.Errorf("parse IP: %v", err)
}
return netlink.AddrAdd(link, addr)
}
Talking containers
Let’s try to connect our containers. I presume here that we’re in directory with busybox rootfs:
$ unc sh
$ ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
475: uv5185@if476: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether e2:2b:71:19:73:73 brd ff:ff:ff:ff:ff:ff
inet 10.100.42.24/24 scope global uv5185
valid_lft forever preferred_lft forever
inet6 fe80::e02b:71ff:fe19:7373/64 scope link
valid_lft forever preferred_lft forever
$ unc ping -c 1 10.100.42.24
PING 10.100.42.24 (10.100.42.24): 56 data bytes
64 bytes from 10.100.42.24: seq=0 ttl=64 time=0.071 ms
--- 10.100.42.24 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.071/0.071/0.071 ms
They can talk! It’s like magic, right? You can find all code under tag netns.
You can install latest versions of unc
and unet
with
go get github.com/LK4D4/unc/...
Binaries will be created in $(go env GOPATH)/bin/
.
The end
This is last post about unprivileged containers(at least about namespaces). We created an isolated environment for process, which you can run under unprivileged user. Containers though is little more than just isolation - also you want to specify what process can do inside container (Linux capabilities), how much resources process can use (Cgroups) and you can imagine many other things. I invite you to look what we have in runc/libcontainer, it’s not very easy code, but I hope that after my posts you will be able to understand it. If you have any questions feel free to write me, I’m always happy to share my humble knowledge about containers.
Previous parts: