Containers: How Processes Stay Separate
The Container Isolation Illusion
When you run an application in a container, like Docker or Kubernetes, it feels like it’s running in its own little world. It has its own filesystem, its own network, and its own set of processes. But how does this isolation actually happen? It’s not magic, it’s Linux.
Linux provides a couple of core features that make containerization possible: namespaces and cgroups. These are the unsung heroes that keep your containers from stepping on each other’s toes.
Namespaces: Giving Processes Their Own View
Think of namespaces as creating a specific view of the system for a process. Each namespace is a way to partition kernel resources such that one set of processes sees one set of resources, and another set of processes sees a different set.
Let’s break down the most common types of namespaces:
-
PID Namespace: This gives processes their own private Process ID (PID) tree. The
initprocess inside a container, for example, will have PID 1, just like theinitprocess on a bare-metal system. Processes outside this namespace won’t see the PIDs inside, and vice-versa.Imagine you have two processes, each running in its own PID namespace. Process A sees PIDs 1, 2, 3. Process B sees PIDs 1, 2, 3. But PID 2 in Process A’s world is not the same as PID 2 in Process B’s world. They are isolated.
-
Network Namespace: This gives processes their own network stack. Each network namespace has its own interfaces, IP addresses, routing tables, and port numbers. This is why you can run two containers on the same host, both listening on port 80, and they won’t conflict.
When a container gets a network namespace, it often gets a virtual network interface. This interface is connected to a virtual bridge on the host, allowing communication between containers and the outside world, but keeping the internal network details private to the namespace.
-
Mount Namespace: This provides processes with their own view of the filesystem. When a process is in a new mount namespace, it starts with a copy of the parent’s mount points. It can then mount and unmount filesystems without affecting other processes.
This is crucial for container images. The container’s filesystem is essentially a read-only layer (the image) overlaid with a writable layer. This mount namespace ensures that any changes made within the container don’t alter the host’s filesystem or other containers’.
-
User Namespace: This allows a process to have a different set of user and group IDs inside the namespace than it has outside. For example, a process could be
root(UID 0) inside its user namespace but mapped to a non-privileged user on the host. This significantly enhances security by limiting the potential damage if a process is compromised. -
UTS Namespace: This isolates the hostname and domain name. Each UTS namespace can have its own hostname, allowing you to name containers independently.
Cgroups: Keeping Resource Usage in Check
While namespaces provide isolation for what processes can see, cgroups (control groups) manage how much of the system’s resources they can use. Cgroups allow you to allocate, limit, and prioritize system resources like CPU, memory, disk I/O, and network bandwidth for a collection of processes.
This is essential for multi-tenancy and stability. Without cgroups, a runaway process in one container could consume all the CPU or memory, starving other applications on the same host. Cgroups ensure that each container gets a fair share (or a pre-defined share) of the system’s resources.
For example, you can set a hard limit on memory usage for a container. If it exceeds that limit, the kernel will start killing processes within that cgroup to stay within the bounds. You can also limit CPU usage, ensuring one container doesn’t monopolize the processor.
How They Work Together
Container runtimes like Docker or containerd orchestrate the creation of these namespaces and cgroups. When you run docker run, Docker (or its underlying components) creates a new set of namespaces for the process, configures the necessary network interfaces within its network namespace, mounts the container’s filesystem within its mount namespace, and then places the process within a cgroup to control its resource consumption.
This combination of namespaces for isolation and cgroups for resource control is what gives containers their power and their perceived “sandboxed” nature. It’s a fundamental part of how modern cloud-native applications are built and deployed, providing a consistent and predictable environment across different machines.
Tags: Containers, Linux, Process Isolation, Docker, Kubernetes, System Administration