Checkpoint Warm Restart
Phase 6.4
Cold restart (Phase 6.2) reloads a fiber from its immutable .npk image — it starts from main(). Warm restart resumes from a saved snapshot — registers, call chain, and optionally domain state. The fiber picks up where it left off.
Design Principles
- The kernel saves the hard parts. Registers, stack, CSpace epoch — mechanical, deterministic, every fiber gets this for free. No fiber author involvement required.
- The fiber saves the domain parts. A navigation error accumulator, a partial spectrometer dataset, a protocol state machine — only the fiber knows what matters. It serializes its own state via a syscall.
- Save trigger is a syscall, not a timer. The fiber calls
sys_checkpoint()at application-defined safe points. A fiber knows when its state is consistent better than any kernel heuristic. Timer-driven checkpoints risk snapshotting half-updated structures. - Dual-copy with independent BLAKE3 hashes. A single corrupted checkpoint is worse than no checkpoint — you restore into corruption that doesn't crash, just silently misbehaves. Two copies, each verified. Both fail → cold restart. (BLAKE3 instead of XXH64 because checkpoint corruption leads to silent misbehavior in restored state — a stronger hash is warranted. The BEB uses XXH64 because its threat model is bitflip detection, not prevention of silent restore into corrupted state.)
- 32KB hard cap on application blob. Non-negotiable. A runaway fiber cannot blow the memory budget. The kernel returns
CHECKPOINT_TOO_LARGE; the fiber decides what to cut.
Architecture
Fiber calls sys_checkpoint(blob_ptr, blob_len)
↓
L1 captures trap frame (registers) + copies stack + copies blob
↓
L1 computes BLAKE3 over entire checkpoint
↓
L1 writes to Copy A, then Copy B (dual-copy)
↓
L1 emits CheckpointWrite(64) STL event
↓
Returns to fiber — zero downtime
─── later, on panic ───
respawn_fiber() called
↓
Sterilize (CSpace revoke, TLB flush)
↓
Verify .npk signature (Ed25519)
↓
Reload ELF from immutable image (fresh code)
↓
check_checkpoint(fiber_id)
├─ Both copies fail BLAKE3 → CheckpointReject(65), cold restart
├─ No checkpoint exists → cold restart
└─ Valid checkpoint found:
↓
Restore stack from checkpoint
Restore registers from checkpoint
↓
blob_size > 0?
├─ No → re-enter at normal entry point (registers set PC)
└─ Yes → re-enter at restore_entry with (blob_ptr, blob_len) in a0/a1Key insight: Warm restart still reloads the ELF. The code is always verified fresh from the immutable image. What the checkpoint saves is the execution context (where was I, what was on my stack) and domain state (what application data did I need). Code integrity is never compromised.
Checkpoint Format
Per-Fiber Checkpoint Blob
Fixed-size structure. Two independent copies per fiber, each with its own BLAKE3 hash.
pub const CHECKPOINT_MAGIC: u32 = 0x434B5021; // "CKP!"
pub const STACK_MAX: usize = 8192; // 8KB stack snapshot (current fiber stack is 4KB/0x1000;
// 8KB is forward-looking for expanded stacks in later phases.
// stack_size field records actual bytes captured, <= STACK_MAX)
pub const BLOB_MAX: usize = 32768; // 32KB application blob cap
pub const REGS_COUNT: usize = 32; // 32 GPRs (RISC-V)
pub const CheckpointData = extern struct {
// Header (24 bytes)
magic: u32, // "CKP!" — validity marker
fiber_id: u16, // Owning fiber (truncated from u64; assert fiber_id < MAX_FIBERS before cast)
_pad_hdr: u16, // Alignment
epoch: u32, // CSpace epoch at checkpoint time
_pad_epoch: u32, // Alignment
timestamp_ns: u64, // When the checkpoint was created
// Register state (272 bytes)
regs: [REGS_COUNT]u64, // x0–x31 (x0 is always zero but stored for simplicity)
sepc: u64, // Program counter at checkpoint
sstatus: u64, // Saved status register
// Stack snapshot (8196 bytes)
stack_size: u32, // Actual stack bytes captured (<= STACK_MAX)
stack_data: [STACK_MAX]u8,
// Application blob (32772 bytes)
blob_size: u32, // 0 if no application blob; <= BLOB_MAX
blob_data: [BLOB_MAX]u8,
// Integrity (32 bytes)
hash: [32]u8, // BLAKE3 of all bytes before this field
};Size per checkpoint: 24 + 272 + 8196 + 32772 + 32 = 41,296 bytes (~40.3 KB).
Memory budget: 16 fibers x 2 copies x 41,296 bytes = 1,321,472 bytes (~1.26 MB).
Storage Region
Dedicated compile-time known physical memory region. NOT carved from the fiber's 1MB user space — checkpoint memory is kernel-only, inaccessible to userland.
pub const CHECKPOINT_REGION_BASE: usize = 0x8600_0000; // Below FIBER_BASE (0x8800_0000)
pub const CHECKPOINT_COPY_SIZE: usize = @sizeOf(CheckpointData);
pub const CHECKPOINT_REGION_SIZE: usize = MAX_FIBERS * 2 * CHECKPOINT_COPY_SIZE;
// ~1.26 MB total
/// Get pointer to checkpoint copy for a fiber.
/// copy_idx: 0 = Copy A, 1 = Copy B
pub fn checkpoint_addr(fiber_id: u64, copy_idx: u1) usize {
return CHECKPOINT_REGION_BASE
+ @as(usize, fiber_id) * 2 * CHECKPOINT_COPY_SIZE
+ @as(usize, copy_idx) * CHECKPOINT_COPY_SIZE;
}Memory map (updated):
0x8000_0000 SysTable
0x8600_0000 Checkpoint Region (1.26 MB, kernel-only)
├─ Fiber 0 Copy A (41 KB)
├─ Fiber 0 Copy B (41 KB)
├─ Fiber 1 Copy A (41 KB)
├─ Fiber 1 Copy B (41 KB)
└─ ... (16 fibers)
0x8615_0000 (end of checkpoint region, page-aligned up from 0x8614_2A00)
0x8800_0000 Fiber user memory (16 x 1 MB)Syscall Interface
sys_checkpoint — Save
Syscall number: SYS_CHECKPOINT = 0x500 (survivability range; 0x200 is already VFS OPEN)
Arguments:
a0 = blob_ptr ([*]const u8 or 0 for no blob)
a1 = blob_len (u32, must be <= 32768; 0 for no blob)
Returns:
a0 = 0 on success
a0 = CHECKPOINT_TOO_LARGE (1) if blob_len > BLOB_MAX
a0 = CHECKPOINT_DISABLED (2) if fiber has no checkpoint capabilityRouting: L1 handler (Option A). The ecall follows the existing path: trap_entry → rss_trap_handler → k_handle_syscall in L1 (Nim). The SYS_CHECKPOINT case is dispatched within k_handle_syscall alongside existing syscalls. The fiber is actively running when it calls sys_checkpoint(), so L1 is alive by definition. L1 calls down to checkpoint.zig for BLAKE3 computation and dual-copy writes via exported C ABI functions. The L1 handler:
- Captures the trap frame registers (already saved by
trap_entryin entry_riscv.zig) - Copies the fiber's stack from its user-mode stack region (with SUM access)
- If
blob_ptr != 0: copiesblob_lenbytes from fiber's address space - Records the CSpace epoch from
cspace_get(fiber_id).epoch - Computes BLAKE3 over the entire
CheckpointData(excluding thehashfield) - Writes Copy A, then Copy B
- Emits
CheckpointWrite(64)withdata0 = xxh64_of_hash(quick fingerprint for telemetry) - Returns to fiber via normal ecall return path (
sepc += 4)
Restore Entry Declaration
Fibers that use application blobs declare a restore entry point in their .npk manifest (BKDL section):
manifest {
name "nav-fiber"
restore-entry 0x80001234 // Called instead of entry on warm restart
}If restore-entry is not declared and a checkpoint with blob_size > 0 exists, the kernel falls back to cold restart (the blob is discarded). This is a safety measure — the fiber must explicitly opt in to blob restore.
on_restore Callback Convention
When the kernel calls restore_entry instead of the normal entry point:
a0 = blob_ptr — pointer to a kernel-provided read-only buffer containing the blob
a1 = blob_len — size of the blobThe fiber rebuilds its domain state from the blob, then resumes normal operation. The stack is already restored (call chain intact, local variables present). The fiber's registers are restored. The fiber's sepc points to the instruction after the sys_checkpoint() ecall — so if the fiber simply returns from on_restore, it continues execution from the point of the last checkpoint.
Simplified pattern for fiber authors:
// In the fiber's main loop:
while (true) {
sensor_data = read_sensor();
accumulator += process(sensor_data);
if (iteration % 1000 == 0) {
// Checkpoint: save accumulator state
struct { float64 acc; uint32 iter; } state = { accumulator, iteration };
sys_checkpoint(&state, sizeof(state));
}
}
// Declared in manifest: restore-entry = on_restore
void on_restore(uint8_t* blob, uint32_t len) {
// Rebuild from blob
accumulator = ((state_t*)blob)->acc;
iteration = ((state_t*)blob)->iter;
// Stack is already restored — return continues the main loop
}Restore Mechanics
In respawn_fiber() — Modified Flow
The existing 4-step respawn sequence becomes 5 steps:
Step 1: Sterilize (unchanged — CSpace revoke, TLB flush)
Step 2: Verify (unchanged — Ed25519 on .npk)
Step 3: Reload (unchanged — fresh ELF from .npk)
Step 4: Restore (NEW — checkpoint restore if available)
Step 5: Re-enter (modified — conditional entry point)Step 4: Restore (new)
fn restore_from_checkpoint(fiber_id: u64) RestoreResult {
// 1. Try Copy A
const copy_a = checkpoint_ptr(fiber_id, 0);
if (verify_checkpoint(copy_a, fiber_id)) {
apply_checkpoint(fiber_id, copy_a);
return .Restored;
}
// 2. Try Copy B
const copy_b = checkpoint_ptr(fiber_id, 1);
if (verify_checkpoint(copy_b, fiber_id)) {
apply_checkpoint(fiber_id, copy_b);
return .Restored;
}
// 3. Both failed
_ = emit_stl(65, fiber_id, 0, 0, fiber_id, 0, 0); // CheckpointReject, reason=0 (hash_fail)
invalidate_checkpoint(fiber_id); // Clear invalid data
return .NoCheckpoint; // Fall back to cold restart
}
pub const RestoreResult = enum {
Restored, // Checkpoint applied successfully
NoCheckpoint, // No valid checkpoint — cold restart
};verify_checkpoint
fn verify_checkpoint(ckp: *const CheckpointData, fiber_id: u64) bool {
if (ckp.magic != CHECKPOINT_MAGIC) return false;
// Compute BLAKE3 of everything before the hash field
const data_size = @offsetOf(CheckpointData, "hash");
const data_ptr: [*]const u8 = @ptrCast(ckp);
var hasher = std.crypto.hash.Blake3.init(.{});
hasher.update(data_ptr[0..data_size]);
var computed: [32]u8 = undefined;
hasher.final(&computed);
if (!std.mem.eql(u8, &computed, &ckp.hash)) return false;
// Epoch staleness check: if CSpace epoch has advanced too far since checkpoint,
// the capability state is too stale to trust. Threshold: 4 epoch bumps.
const cspace = @import("cspace.zig");
if (cspace.cspace_get(fiber_id)) |cs| {
const delta = cs.epoch -% ckp.epoch;
if (delta > EPOCH_STALENESS_THRESHOLD) return false;
}
return true;
}
pub const EPOCH_STALENESS_THRESHOLD: u32 = 4;apply_checkpoint
fn apply_checkpoint(fiber_id: u64, ckp: *const CheckpointData) void {
// 1. Verify CSpace epoch is compatible
// If checkpoint epoch != current epoch, capabilities may have changed.
// For safety: re-grant from manifest (already done in Reload step).
// 2. Restore stack to fiber's stack region
const stack_base = get_fiber_load_base(fiber_id) + FIBER_STACK_OFFSET - ckp.stack_size;
const dest: [*]u8 = @ptrFromInt(stack_base);
@memcpy(dest[0..ckp.stack_size], ckp.stack_data[0..ckp.stack_size]);
// 3. Store register state for re-enter step
fiber_saved_regs[fiber_id] = ckp.regs;
fiber_saved_sepc[fiber_id] = ckp.sepc;
// 4. Update SP from saved registers (SP = x2 in RISC-V)
fiber_stack_pointers[fiber_id] = ckp.regs[2]; // x2 = sp
// 5. If blob present, copy to blob buffer in fiber's address space
// Blob buffer: last 32KB of the fiber's 1MB region, before the stack
// Address: FIBER_BASE + fiber_id * FIBER_MEM_SIZE + BLOB_OFFSET
// BLOB_OFFSET = FIBER_MEM_SIZE - 0x1000 (stack) - BLOB_MAX = 0xF7000
if (ckp.blob_size > 0) {
const blob_dest = get_fiber_load_base(fiber_id) + BLOB_OFFSET;
const dest: [*]u8 = @ptrFromInt(blob_dest);
@memcpy(dest[0..ckp.blob_size], ckp.blob_data[0..ckp.blob_size]);
fiber_blob_ptr[fiber_id] = blob_dest;
fiber_blob_len[fiber_id] = ckp.blob_size;
}
}
const BLOB_OFFSET: usize = FIBER_MEM_SIZE - 0x1000 - BLOB_MAX; // 0xF7000Step 5: Re-enter (modified)
fn reenter(fiber_id: u64) void {
const restored = fiber_restored[fiber_id]; // Set by Step 4
if (restored and fiber_blob_len[fiber_id] > 0) {
// Warm restart with blob — use restore_entry from manifest
const re = npk_registry[fiber_id].restore_entry;
if (re != 0) {
// New function: enters userland with a0=blob_ptr, a1=blob_len
// instead of a0=systable. Same sret mechanics as hal_enter_userland.
hal_enter_userland_restore(
re, // entry = restore_entry
fiber_blob_ptr[fiber_id], // a0 = blob_ptr
fiber_blob_len[fiber_id], // a1 = blob_len
fiber_stack_pointers[fiber_id], // sp
);
return;
}
// No restore_entry declared — discard blob, cold restart
}
if (restored) {
// Warm restart without blob — resume from saved PC
// Stack and SP already restored by apply_checkpoint
hal_enter_userland(
fiber_saved_sepc[fiber_id],
SYSTABLE_BASE,
fiber_stack_pointers[fiber_id],
);
return;
}
// Cold restart — normal entry point
hal_enter_userland(
fiber_entry_points[fiber_id],
SYSTABLE_BASE,
fiber_stack_pointers[fiber_id],
);
}hal_enter_userland_restore (new function in entry_riscv.zig)
Identical to hal_enter_userland except a0 and a1 carry blob pointer/length instead of SysTable address:
export fn hal_enter_userland_restore(entry: u64, blob_ptr: u64, blob_len: u64, sp: u64) callconv(.c) void {
var kstack: usize = 0;
asm volatile ("mv %[kstack], sp" : [kstack] "=r" (kstack));
asm volatile (
\\ li t0, 0x20
\\ csrs sstatus, t0
\\ li t1, 0x100
\\ csrc sstatus, t1
\\ li t2, 0x40000
\\ csrs sstatus, t2
\\ csrw sepc, %[entry]
\\ csrw sscratch, %[kstack]
\\ mv sp, %[sp]
\\ mv a0, %[blob_ptr]
\\ mv a1, %[blob_len]
\\ sret
:
: [entry] "r" (entry),
[blob_ptr] "r" (blob_ptr),
[blob_len] "r" (blob_len),
[sp] "r" (sp),
[kstack] "r" (kstack),
);
}Checkpoint Invalidation
A checkpoint becomes invalid when:
- The .npk image changes (firmware update). Old checkpoints reference code that no longer exists.
respawn_set_npk()must clear the checkpoint region for that fiber. - Manual invalidation via ground station command or L1 verdict ring.
- Dual-copy verification failure — both copies fail BLAKE3 → checkpoint is cleared to prevent future attempts.
- CSpace epoch delta exceeds threshold — if the fiber's CSpace has been revoked and re-granted many times since the checkpoint, the checkpoint's capability state may be too stale. Threshold: configurable, default 4 epoch bumps.
pub fn invalidate_checkpoint(fiber_id: u64) void {
// Zero the magic bytes of both copies — fast invalidation
const copy_a: *CheckpointData = @ptrFromInt(checkpoint_addr(fiber_id, 0));
const copy_b: *CheckpointData = @ptrFromInt(checkpoint_addr(fiber_id, 1));
copy_a.magic = 0;
copy_b.magic = 0;
// reason codes: 0=hash_fail, 1=epoch_stale, 2=manual
_ = emit_stl(65, fiber_id, 0, 0, fiber_id, 2, 0); // CheckpointReject, reason=manual
}Performance Budget
| Operation | Budget | Notes |
|---|---|---|
| Save (kernel portion) | < 200 us | Copy 8KB stack + 32KB blob + BLAKE3 + dual write |
| Restore (kernel portion) | < 100 us | Verify BLAKE3 + copy stack + copy registers |
| Restore (fiber callback) | Fiber's problem | on_restore time is charged to the fiber's budget |
| BLAKE3 of 41KB | ~15 us | Zig std BLAKE3 on RISC-V @ 100MHz |
| Stack copy (8KB) | ~5 us | memcpy, cache-line aligned |
| Dual write (82KB) | ~50 us | Two sequential 41KB writes |
Save is infrequent (fiber-initiated, application-defined cadence). Restore happens only after a panic — latency is acceptable.
Ontology Events
| EventKind | Value | Emitter | data0 | data1 | data2 |
|---|---|---|---|---|---|
CheckpointWrite | 64 | L1 (save handler) | XXH64 fingerprint of BLAKE3 hash | blob_size | stack_size |
CheckpointReject | 65 | L0 (restore) or L1 (invalidation) | fiber_id | reason (0=hash_fail, 1=epoch_stale, 2=manual) | — |
CheckpointRegionInit | 66 | L0 (boot) | region_base (0x8600_0000) | region_size | — |
CheckpointRegionInit is emitted once at boot by init_region(), proving the checkpoint memory was explicitly zeroed this boot cycle. Any checkpoint whose timestamp predates this event is provably stale.
NPK Registry Extension
The NpkEntry in respawn.zig gains a restore_entry field:
pub const NpkEntry = struct {
fiber_id: u64,
image_offset: u64,
image_size: u64,
pubkey: [32]u8,
path_hash: u64,
restore_entry: u64, // NEW: 0 = no restore callback; nonzero = on_restore address
};Populated at boot time from the .npk manifest's restore-entry field. Default 0 (cold restart only).
Files
New Files
| File | Purpose | Est. LOC | Layer |
|---|---|---|---|
hal/checkpoint.zig | Checkpoint storage, format, verify, apply, invalidate | ~300 | L0 |
Modified Files
| File | Change |
|---|---|
hal/respawn.zig | Integrate restore_from_checkpoint() into respawn flow (between Reload and Re-enter); modify reenter() for conditional entry point; remove CheckpointStatus stub (moved to checkpoint.zig) |
hal/respawn.zig | Add restore_entry field to NpkEntry |
hal/entry_riscv.zig | Route SYS_CHECKPOINT ecall to L1 handler |
hal/abi.zig | Comptime import checkpoint.zig; FFI re-exports |
core/kernel.nim | Add SYS_CHECKPOINT handler — capture registers, stack, blob, compute BLAKE3, dual-copy write, emit STL event |
core/fiber.nim | Add checkpoint-related metadata (optional — or store in checkpoint.zig's own per-fiber table) |
Testing Strategy
Unit Tests (checkpoint.zig, zig build test)
- Checkpoint format:
@sizeOf(CheckpointData)comptime assert - Write + verify round-trip: create checkpoint, compute BLAKE3, verify passes
- Corrupted checkpoint: flip one bit → verify fails
- Dual-copy: corrupt Copy A, verify Copy B succeeds
- Both copies corrupted →
NoCheckpointresult - Magic byte invalidation:
invalidate_checkpoint()→ verify returns false - Blob cap enforcement: blob_len > BLOB_MAX → rejected
checkpoint_addr()arithmetic: fiber 0 copy 0, fiber 15 copy 1 → correct addresses- Epoch staleness: checkpoint epoch vs current epoch delta > threshold → reject
Integration Tests
- Full save → panic → restore cycle: fiber checkpoints, inject fault, verify warm restart (stack restored, registers restored)
- Save with blob → restore with blob: verify
on_restoreentry called with correct blob pointer and length - Save without blob → restore: verify normal entry point (PC from checkpoint) used
- Dual-copy corruption: corrupt Copy A after save, inject fault, verify Copy B used
- Both copies corrupted: inject fault → verify cold restart from .npk
- Checkpoint after firmware update: change .npk, inject fault → verify checkpoint invalidated, cold restart used
Related Specifications
- Phase 6.2: Restart Trap — cold restart pipeline (sterilize, verify, reload, re-enter)
- Phase 6.3: Adaptive FDIR — anomaly detection that triggers respawn
- SPEC-020: Capability Algebra — CSpace epoch semantics during restore
- SPEC-060: System Ontology — CheckpointWrite (64), CheckpointReject (65) events
- SPEC-085: BEB dual-copy pattern — mirrors the dual-copy + independent hash approach