A Technical Advisory was published by NCC Group recently regarding the discovery of a vulnerability that allows a container with access to the host network namespace to interact with the API of the container runtime via Unix domain sockets. This access can then be further exploited to interact with the host system (container escape to host). This issue has been assigned CVE-2020-15257.
This post attempts to simplify the issue caused due to the containerd-shim API of containerd being exposed to Host Network Containers and what it means to attackers and defenders in the real world.
containerd is the container runtime that Docker Engine uses for all container related activity, managing the file, network and IO namespaces amongst other things. containerd-shim is a binary that is started by containerd when a container is started using the docker command. You can quickly see that containerd-shim is invoked by starting a container and inspecting the processes on the host, as shown below
docker ps _# make sure no containers are running_
ps fauxx | grep containerd
docker run -d --name vuln-container ubuntu sleep infinity
ps fauxx | grep containerd
containerd-shim is responsible for the actual execution of the container lifecycle that it exposes to containerd via an API. This is why the binary is named as containerd-shim. A shim in computing, in the simplest manner, allows for transparent communication by rewriting requests and parameters so that it is understood by whatever program it is a shim of. The containerd-shim API is exposed via an abstract Unix domain socket that is accessible on the host machines network namespace. A process that is able to reach and interact with the abstract Unix domain socket would be able to invoke functions that the API supports.
Abstract Unix domain sockets are Linux specific sockets that do not have a file mapping on the filesystem but are tied to the network namespace of a process. The naming convention of an abstract Unix domain socket calls for it to start with a NUL character (\0).
Running a container with host network privileges (—net host on Docker or .spec.hostNetwork: true on Kubernetes) allows for the container to access the root namespace completely when the container is running as UID 0 (user within container is root and docker command was not started with —user option). For example, a process started within a container that is running with host network privileges will be able to use the host network capabilities including things like listening for connections on the host network interface. The ability to access the containerd-shim APIs via the host network exposed abstract Unix domain sockets can lead to all sorts of security problems including the ability to read and write to the host file system, execute commands on the host as root and spin up other containers as required.
The CVE was fixed in containerd v1.4.3/v1.3.9, by switching away from abstract sockets into plain old file-based UNIX sockets under /run/containerd. To see the version of containerd on your system, run docker version
What does this mean to an attacker?
This is not a network exploitable vulnerability. Multiple things would have to work for an attacker to be able to reach the endpoints that would be used to trigger the API calls
- The attacker would have to compromise and gain root access to the container or pod
- The container would have to be running with host network privileges, —net host on Docker or .spec.hostNetwork: true on Kubernetes
An attacker could potentially gain access to a target container by exploiting a different vulnerability in an exposed service for example. A web application or service layer RCE could allow access to the container or the pod. Attackers most often obtain a stable reverse shell connection to continue their post exploitation shenanigans.
As the container is running with host network privileges and as UID 0, the attacker would be able to access the host’s abstract Unix domain sockets. If you are playing along at home, then
- run docker run -d —name vuln-container —net host ubuntu sleep infinity to start a Ubuntu container with access to the host network namespace
- run netstat -xl or cat /proc/net/unix | grep ‘containerd-shim’ | grep ’@’ to see that the host’s abstract containerd-shim unix domain socket is shared
As abstract Unix domain socket names start with NUL characters, common Unix tools like socat, cURL or netcat cannot be used to connect and interact with the socket. The @ character is due to the NUL ASCII character representation as Ctrl+@ (^@) is commonly used to enter the NUL character in terminals.
containerd-shim exposes a number of gRPC APIs. To interact with them you require a gRPC client. As no exploit code was released for this vulnerability, an attacker would be able to build a client to invoke and interact with the APIs using using Java for example.
Based on the original advisory by NCC group, the following potential exploitation scenarios are possible
- Arbitrary file reads
- Arbitrary file appends
- Arbitrary file writes
- Arbitrary command execution in the context of containerd-shim (root)
- Creating a container from a runc config.json file
- Starting a created container
An attacker could use this vulnerability and with the help of the API and escape the container to the host. This opens up a whole world of possibilities to the attacker from a post exploitation scenario. The attacker could then take over the node/host, access other pods and containers, access secrets, discover additional assets and identify and attack services, move to a different cluster or host depending on how the environment is setup and gain access to additional systems.
What does this mean to a defender/admin/maintainer? How do I know I’m safe?
If your version of containerd is older than 1.4.3/1.3.9 then you are vulnerable. Of course, the issue manifests only if you are running containers with the host network namespace AND with UID 0. Updating to the version with fixes is a good first step. Additionally, as any running containers created prior to updating containerd to a fixed version will remain vulnerable after the update, all containers must be stopped and then restarted after the update is completed.
You can identify potentially vulnerable running containers using the following commands
Docker docker ps —filter ‘network=host’
Kubernetes kubectl get pods -A -o json | jq -c ‘.items | select(.spec.hostNetwork==true) | [.metadata.namespace, .metadata.name]’
This issue has been fixed in the v1.4.3/v1.3.9 version of containerd by moving away from abstract sockets to file based UNIX sockets and relying on the underlying host to provide the ACLs for access. You can see the commits for this at https://github.com/containerd/containerd/commit/126b35ca433b18b236db85ee7f87831e1dce56c9
Running containers using the host network namespace is considered insecure regardless of this vulnerability and hence it is recommended to use the docker IP or to expose the required service port to the host. For kubernetes, using port-forward instead of using the node’s network namespace is far more secure and reliable.
Additionally, if using —net host is unavoidable, then the advisory recommends starting the container with a non 0 UID user, for example
docker run -d --user 1001 --net host --name ubuntu-container ubuntu sleep infinity
Additional mitigations are discussed here, including the usage of AppArmor and SELinux
Although an interesting security problem, the real world possibility of an attacker gaining access to a container or a pod which additionally uses the host’s network namespace AND is running with UID 0 is far slimmer than what it looks like. It is always recommended though, as best practices go
- patch when there is a release (especially security related)
- follow the principle of least privileges
- run workloads in isolated namespaces
- enforce ingress rules for whitelisted IPs when possible
- validate access and security of the application layer services and apps and
- never provide more privileges than what is needed for your containers