Container Checkpoint/Restore with CRIU

Container restarts are a common occurrence in production systems. Whether it's a node failure, scheduled maintenance, or resource rebalancing, traditional container restart means losing all in-memory state and forcing applications to rebuild their working set from scratch. For stateful applications, this translates to service interruption, degraded performance, and potentially impacts end-users.

CRIU (Checkpoint/Restore In Userspace) changes this equation entirely. Instead of killing and restarting containers, CRIU enables live migration of running containers with full state preservation, including memory contents, open file descriptors, and network connections.

Note*: It is essential to recognize that the “liveness” of live migration is implementation-specific; the following factors influence it: (a) Is the snapshotted or checkpointed process still running? (b) Latency to restore the checkpointed process? (c) Is there downtime between the checkpointed application serving live traffic and when the restored process is ready to serve live traffic?*

The Problem with Traditional Container Restarts#

When Kubernetes reschedules a pod or Docker restarts a container, the process is destructive:

SIGTERM sent to main process
SIGKILL after a grace period
All memory state is discarded
New container starts from scratch
Application rebuilds caches, reconnects to databases, and reloads configuration

For a web application with a 2GB in-memory cache, this may result in 30-60 seconds of degraded performance while the cache rebuilds. For a machine learning inference service with loaded models, the restart time could be several minutes.

How CRIU Works: Process State Serialization#

CRIU operates at the Linux kernel level, leveraging several kernel features to capture and restore complete process state:

Memory Dumping#

CRIU uses /proc/PID/pagemap and /proc/PID/maps to identify all memory regions belonging to a process tree. It then:

Freezes the process tree using ptrace(PTRACE_SEIZE)
Dumps all memory pages to disk
Captures memory mapping information (heap, stack, shared libraries)
Records memory protection flags and special mappings

File Descriptor Preservation#

Every open file descriptor is catalogued and preserved:

Regular files: path and offset position
Sockets: protocol state, connection endpoints, buffer contents
Pipes: buffer data and connection topology
Device files: state-dependent handling

Process Tree Topology#

CRIU reconstructs the exact process hierarchy:

Parent-child relationships
Process groups and sessions
Signal handlers and pending signals
CPU registers and execution state

Practical Implementation with Docker#

Let's walk through a real checkpoint/restore scenario. First, ensure CRIU is installed and your kernel supports the necessary features:

Integration with containerd#

Production Considerations#

Performance Impact#

Checkpoint operations aren't free:

Memory dump time: ~100MB/sec for typical workloads
Network freeze duration: 10-500ms depending on connection count
Restore time: Usually 2-5x faster than cold start

Kernel Requirements#

CRIU requires specific kernel features:

CONFIG_CHECKPOINT_RESTORE=y
CONFIG_NAMESPACES=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y

Security Implications#

Checkpoint images contain complete process memory:

Encrypt checkpoint storage
Implement access controls
Consider secrets in memory dumps
Validate checkpoint integrity

Limitations and Gotchas#

Network Connections#

TCP connections can be restored, but may need re-establishment
UDP sockets restore more reliably
External services may timeout during migration

File System Dependencies#

Absolute paths must exist on restore host
Mounted volumes need identical configuration
Device files may not be portable

Container Runtime Integration#

Docker checkpoint support is experimental
Kubernetes native support is limited at the time of writing
Custom orchestration often required

Advanced Use Cases#

Database Migration#

For databases with large buffer pools:

Stateful Service Scaling#

CRIU enables novel scaling patterns:

Checkpoint running instance
Restore multiple copies for instant horizontal scaling
Preserve expensive initialization state

Future: Kubernetes Integration#

Several projects are working on Kubernetes integration:

Kubernetes Enhancement Proposal (KEP) for native checkpoint/restore
Podman checkpoint integration with CRI-O
Third-party operators for automated live migration, one of which is DevZero

Conclusion#

CRIU transforms container restart from a disruptive operation into seamless live migration. While not suitable for every workload, it's particularly valuable for:

Stateful applications with expensive initialization
Services with large in-memory caches
Long-running computations that need migration
Zero-downtime maintenance scenarios

The technology is production-ready for specific use cases, though broader ecosystem integration is still evolving. For organizations running stateful workloads at scale, CRIU provides a powerful tool for achieving true zero-downtime operations.

Ready to implement live migration in your infrastructure? Start with non-critical workloads, measure the performance characteristics, and gradually expand to more critical services as you build operational confidence.

‍

Container Checkpoint/Restore with CRIU

The Problem with Traditional Container Restarts#