Pre-alpha — APIs, wire formats, and behavior may change without notice. Expect breaking changes; use with caution.
emberd

The Guest Rootfs

The custom initramfs, the overlayfs root, and the initramfs-style switch_root that emberd-init performs as PID 1.

When a sandbox boots, the kernel runs emberd's own initramfs — a tiny cpio archive whose /init is a statically-linked emberd-init. As PID 1, that binary builds the real root filesystem and then serves the control plane.

The initramfs

rootfs/build.sh produces ~/firecracker-verify/emberd-initramfs.cpio:

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
  go build -trimpath -ldflags="-s -w" -o "$STAGE/init" ./cmd/emberd-init
mkdir -p "$STAGE"/{proc,sys,dev,newroot,tmp}   # mountpoints emberd-init needs
( cd "$STAGE" && find . | cpio --create --format=newc ) > emberd-initramfs.cpio

Building with CGO_ENABLED=0 means the binary is fully static — the initramfs carries no libc, no shell, nothing but /init and a handful of empty mountpoints (~2.5 MiB total). Everything else comes from the language pack.

The overlay root

The architecture wants a writable root without mutating the immutable base image, so emberd-init assembles an overlayfs:

  • lower — the language-pack squashfs, mounted read-only.
  • upper + work — a fresh tmpfs (per-VM, lives in RAM).
  • merged — the overlay of the two; this becomes the new root.

Writes land in the tmpfs upper; the squashfs is never touched and its pages are shareable across sandboxes. Reset semantics are trivial: the tmpfs dies with the VM.

The read-only squashfs (/dev/vda, lower layer) and a per-VM tmpfs (upper + work) are merged by overlayfs; switch_root makes the merged view the writable, per-VM, ephemeral root /.

The boot sequence

bootstrapPID1() runs only when os.Getpid() == 1 (so host-side unit tests of the same binary skip it):

  1. Mount proc on /proc, devtmpfs on /dev (this is what makes /dev/vda appear), and sysfs on /sys — all inside the initramfs.
  2. Mount the squashfs (/dev/vda) read-only at /lower.
  3. Mount a tmpfs at /overlay; create upper/ and work/ on it.
  4. mount -t overlay with lowerdir=/lower,upperdir=/overlay/upper,workdir=/overlay/work at /newroot.
  5. MS_MOVE /proc, /dev, /sys into /newroot (carrying their subtrees).
  6. switch_root into /newroot.
  7. Set a default PATH and HOME (PID 1 starts with an empty environment, so python3 isn't resolvable until this is done).
  8. Read emberd.interpreter= from /proc/cmdline (see language packs).
  9. Start the child reaper (a SIGCHLD handler). As PID 1, emberd-init inherits any process a workload double-forks; the reaper wait4s them so they don't leak as zombies, while still letting runExec collect the interpreter's own exit code.
  10. Serve the vsock control plane.

Why switch_root and not pivot_root?

pivot_root(2) is rejected on an initramfs (EINVAL). So emberd uses the busybox switch_root technique instead:

syscall.Chdir(newRoot)            // cd /newroot
syscall.Mount(".", "/", "", syscall.MS_MOVE, "")  // move merged root onto /
syscall.Chroot(".")               // chroot into it
syscall.Chdir("/")

Moving the /proc, /dev, /sys submounts into /newroot before the final MS_MOVE means they ride along when the whole subtree relocates to /.

You can confirm it worked from inside a running sandbox — /proc/mounts shows:

overlay / overlay rw,relatime,lowerdir=/lower,upperdir=/overlay/upper,workdir=/overlay/work 0 0

/ is the overlay, and writes anywhere (e.g. /root, /etc via copy-up) succeed.

Kernel requirements

The guest kernel needs SQUASHFS, OVERLAY_FS, DEVTMPFS, TMPFS, and VIRTIO_BLK (plus the vsock options from the control-plane page). The Firecracker CI kernel has all of them built in, verified by extracting its embedded config — which is why emberd boots on the stock kernel with no rebuild.

The kernel boot args still include panic=1, so if emberd-init ever exits or the bootstrap fails, the guest panics and the VM dies — which is the correct outcome for a sandbox whose agent is gone. The VM stays alive purely because PID 1 (emberd-init) blocks forever in its vsock accept loop; nothing on the host needs to hold the VMM's stdin open (it's wired to /dev/null).

On this page