Skip to content

Checkpoint Warm Restart

Phase 6.4

Cold restart (Phase 6.2) reloads a fiber from its immutable .npk image — it starts from main(). Warm restart resumes from a saved snapshot — registers, call chain, and optionally domain state. The fiber picks up where it left off.

Design Principles

  1. The kernel saves the hard parts. Registers, stack, CSpace epoch — mechanical, deterministic, every fiber gets this for free. No fiber author involvement required.
  2. The fiber saves the domain parts. A navigation error accumulator, a partial spectrometer dataset, a protocol state machine — only the fiber knows what matters. It serializes its own state via a syscall.
  3. Save trigger is a syscall, not a timer. The fiber calls sys_checkpoint() at application-defined safe points. A fiber knows when its state is consistent better than any kernel heuristic. Timer-driven checkpoints risk snapshotting half-updated structures.
  4. Dual-copy with independent BLAKE3 hashes. A single corrupted checkpoint is worse than no checkpoint — you restore into corruption that doesn't crash, just silently misbehaves. Two copies, each verified. Both fail → cold restart. (BLAKE3 instead of XXH64 because checkpoint corruption leads to silent misbehavior in restored state — a stronger hash is warranted. The BEB uses XXH64 because its threat model is bitflip detection, not prevention of silent restore into corrupted state.)
  5. 32KB hard cap on application blob. Non-negotiable. A runaway fiber cannot blow the memory budget. The kernel returns CHECKPOINT_TOO_LARGE; the fiber decides what to cut.

Architecture

Fiber calls sys_checkpoint(blob_ptr, blob_len)

L1 captures trap frame (registers) + copies stack + copies blob

L1 computes BLAKE3 over entire checkpoint

L1 writes to Copy A, then Copy B (dual-copy)

L1 emits CheckpointWrite(64) STL event

Returns to fiber — zero downtime

            ─── later, on panic ───

respawn_fiber() called

Sterilize (CSpace revoke, TLB flush)

Verify .npk signature (Ed25519)

Reload ELF from immutable image (fresh code)

check_checkpoint(fiber_id)
    ├─ Both copies fail BLAKE3 → CheckpointReject(65), cold restart
    ├─ No checkpoint exists → cold restart
    └─ Valid checkpoint found:

        Restore stack from checkpoint
        Restore registers from checkpoint

        blob_size > 0?
        ├─ No  → re-enter at normal entry point (registers set PC)
        └─ Yes → re-enter at restore_entry with (blob_ptr, blob_len) in a0/a1

Key insight: Warm restart still reloads the ELF. The code is always verified fresh from the immutable image. What the checkpoint saves is the execution context (where was I, what was on my stack) and domain state (what application data did I need). Code integrity is never compromised.

Checkpoint Format

Per-Fiber Checkpoint Blob

Fixed-size structure. Two independent copies per fiber, each with its own BLAKE3 hash.

zig
pub const CHECKPOINT_MAGIC: u32 = 0x434B5021; // "CKP!"
pub const STACK_MAX: usize = 8192;            // 8KB stack snapshot (current fiber stack is 4KB/0x1000;
                                               //   8KB is forward-looking for expanded stacks in later phases.
                                               //   stack_size field records actual bytes captured, <= STACK_MAX)
pub const BLOB_MAX: usize = 32768;            // 32KB application blob cap
pub const REGS_COUNT: usize = 32;             // 32 GPRs (RISC-V)

pub const CheckpointData = extern struct {
    // Header (24 bytes)
    magic: u32,             // "CKP!" — validity marker
    fiber_id: u16,          // Owning fiber (truncated from u64; assert fiber_id < MAX_FIBERS before cast)
    _pad_hdr: u16,          // Alignment
    epoch: u32,             // CSpace epoch at checkpoint time
    _pad_epoch: u32,        // Alignment
    timestamp_ns: u64,      // When the checkpoint was created

    // Register state (272 bytes)
    regs: [REGS_COUNT]u64,  // x0–x31 (x0 is always zero but stored for simplicity)
    sepc: u64,              // Program counter at checkpoint
    sstatus: u64,           // Saved status register

    // Stack snapshot (8196 bytes)
    stack_size: u32,        // Actual stack bytes captured (<= STACK_MAX)
    stack_data: [STACK_MAX]u8,

    // Application blob (32772 bytes)
    blob_size: u32,         // 0 if no application blob; <= BLOB_MAX
    blob_data: [BLOB_MAX]u8,

    // Integrity (32 bytes)
    hash: [32]u8,           // BLAKE3 of all bytes before this field
};

Size per checkpoint: 24 + 272 + 8196 + 32772 + 32 = 41,296 bytes (~40.3 KB).

Memory budget: 16 fibers x 2 copies x 41,296 bytes = 1,321,472 bytes (~1.26 MB).

Storage Region

Dedicated compile-time known physical memory region. NOT carved from the fiber's 1MB user space — checkpoint memory is kernel-only, inaccessible to userland.

zig
pub const CHECKPOINT_REGION_BASE: usize = 0x8600_0000; // Below FIBER_BASE (0x8800_0000)
pub const CHECKPOINT_COPY_SIZE: usize = @sizeOf(CheckpointData);
pub const CHECKPOINT_REGION_SIZE: usize = MAX_FIBERS * 2 * CHECKPOINT_COPY_SIZE;
// ~1.26 MB total

/// Get pointer to checkpoint copy for a fiber.
/// copy_idx: 0 = Copy A, 1 = Copy B
pub fn checkpoint_addr(fiber_id: u64, copy_idx: u1) usize {
    return CHECKPOINT_REGION_BASE
        + @as(usize, fiber_id) * 2 * CHECKPOINT_COPY_SIZE
        + @as(usize, copy_idx) * CHECKPOINT_COPY_SIZE;
}

Memory map (updated):

0x8000_0000  SysTable
0x8600_0000  Checkpoint Region (1.26 MB, kernel-only)
             ├─ Fiber 0 Copy A (41 KB)
             ├─ Fiber 0 Copy B (41 KB)
             ├─ Fiber 1 Copy A (41 KB)
             ├─ Fiber 1 Copy B (41 KB)
             └─ ... (16 fibers)
0x8615_0000  (end of checkpoint region, page-aligned up from 0x8614_2A00)
0x8800_0000  Fiber user memory (16 x 1 MB)

Syscall Interface

sys_checkpoint — Save

Syscall number: SYS_CHECKPOINT = 0x500  (survivability range; 0x200 is already VFS OPEN)
Arguments:
    a0 = blob_ptr ([*]const u8 or 0 for no blob)
    a1 = blob_len (u32, must be <= 32768; 0 for no blob)
Returns:
    a0 = 0 on success
    a0 = CHECKPOINT_TOO_LARGE (1) if blob_len > BLOB_MAX
    a0 = CHECKPOINT_DISABLED (2) if fiber has no checkpoint capability

Routing: L1 handler (Option A). The ecall follows the existing path: trap_entryrss_trap_handlerk_handle_syscall in L1 (Nim). The SYS_CHECKPOINT case is dispatched within k_handle_syscall alongside existing syscalls. The fiber is actively running when it calls sys_checkpoint(), so L1 is alive by definition. L1 calls down to checkpoint.zig for BLAKE3 computation and dual-copy writes via exported C ABI functions. The L1 handler:

  1. Captures the trap frame registers (already saved by trap_entry in entry_riscv.zig)
  2. Copies the fiber's stack from its user-mode stack region (with SUM access)
  3. If blob_ptr != 0: copies blob_len bytes from fiber's address space
  4. Records the CSpace epoch from cspace_get(fiber_id).epoch
  5. Computes BLAKE3 over the entire CheckpointData (excluding the hash field)
  6. Writes Copy A, then Copy B
  7. Emits CheckpointWrite(64) with data0 = xxh64_of_hash (quick fingerprint for telemetry)
  8. Returns to fiber via normal ecall return path (sepc += 4)

Restore Entry Declaration

Fibers that use application blobs declare a restore entry point in their .npk manifest (BKDL section):

kdl
manifest {
    name "nav-fiber"
    restore-entry 0x80001234    // Called instead of entry on warm restart
}

If restore-entry is not declared and a checkpoint with blob_size > 0 exists, the kernel falls back to cold restart (the blob is discarded). This is a safety measure — the fiber must explicitly opt in to blob restore.

on_restore Callback Convention

When the kernel calls restore_entry instead of the normal entry point:

a0 = blob_ptr — pointer to a kernel-provided read-only buffer containing the blob
a1 = blob_len — size of the blob

The fiber rebuilds its domain state from the blob, then resumes normal operation. The stack is already restored (call chain intact, local variables present). The fiber's registers are restored. The fiber's sepc points to the instruction after the sys_checkpoint() ecall — so if the fiber simply returns from on_restore, it continues execution from the point of the last checkpoint.

Simplified pattern for fiber authors:

c
// In the fiber's main loop:
while (true) {
    sensor_data = read_sensor();
    accumulator += process(sensor_data);

    if (iteration % 1000 == 0) {
        // Checkpoint: save accumulator state
        struct { float64 acc; uint32 iter; } state = { accumulator, iteration };
        sys_checkpoint(&state, sizeof(state));
    }
}

// Declared in manifest: restore-entry = on_restore
void on_restore(uint8_t* blob, uint32_t len) {
    // Rebuild from blob
    accumulator = ((state_t*)blob)->acc;
    iteration = ((state_t*)blob)->iter;
    // Stack is already restored — return continues the main loop
}

Restore Mechanics

In respawn_fiber() — Modified Flow

The existing 4-step respawn sequence becomes 5 steps:

Step 1: Sterilize     (unchanged — CSpace revoke, TLB flush)
Step 2: Verify         (unchanged — Ed25519 on .npk)
Step 3: Reload         (unchanged — fresh ELF from .npk)
Step 4: Restore        (NEW — checkpoint restore if available)
Step 5: Re-enter       (modified — conditional entry point)

Step 4: Restore (new)

zig
fn restore_from_checkpoint(fiber_id: u64) RestoreResult {
    // 1. Try Copy A
    const copy_a = checkpoint_ptr(fiber_id, 0);
    if (verify_checkpoint(copy_a, fiber_id)) {
        apply_checkpoint(fiber_id, copy_a);
        return .Restored;
    }

    // 2. Try Copy B
    const copy_b = checkpoint_ptr(fiber_id, 1);
    if (verify_checkpoint(copy_b, fiber_id)) {
        apply_checkpoint(fiber_id, copy_b);
        return .Restored;
    }

    // 3. Both failed
    _ = emit_stl(65, fiber_id, 0, 0, fiber_id, 0, 0); // CheckpointReject, reason=0 (hash_fail)
    invalidate_checkpoint(fiber_id); // Clear invalid data
    return .NoCheckpoint; // Fall back to cold restart
}

pub const RestoreResult = enum {
    Restored,       // Checkpoint applied successfully
    NoCheckpoint,   // No valid checkpoint — cold restart
};

verify_checkpoint

zig
fn verify_checkpoint(ckp: *const CheckpointData, fiber_id: u64) bool {
    if (ckp.magic != CHECKPOINT_MAGIC) return false;

    // Compute BLAKE3 of everything before the hash field
    const data_size = @offsetOf(CheckpointData, "hash");
    const data_ptr: [*]const u8 = @ptrCast(ckp);

    var hasher = std.crypto.hash.Blake3.init(.{});
    hasher.update(data_ptr[0..data_size]);
    var computed: [32]u8 = undefined;
    hasher.final(&computed);

    if (!std.mem.eql(u8, &computed, &ckp.hash)) return false;

    // Epoch staleness check: if CSpace epoch has advanced too far since checkpoint,
    // the capability state is too stale to trust. Threshold: 4 epoch bumps.
    const cspace = @import("cspace.zig");
    if (cspace.cspace_get(fiber_id)) |cs| {
        const delta = cs.epoch -% ckp.epoch;
        if (delta > EPOCH_STALENESS_THRESHOLD) return false;
    }

    return true;
}

pub const EPOCH_STALENESS_THRESHOLD: u32 = 4;

apply_checkpoint

zig
fn apply_checkpoint(fiber_id: u64, ckp: *const CheckpointData) void {
    // 1. Verify CSpace epoch is compatible
    //    If checkpoint epoch != current epoch, capabilities may have changed.
    //    For safety: re-grant from manifest (already done in Reload step).

    // 2. Restore stack to fiber's stack region
    const stack_base = get_fiber_load_base(fiber_id) + FIBER_STACK_OFFSET - ckp.stack_size;
    const dest: [*]u8 = @ptrFromInt(stack_base);
    @memcpy(dest[0..ckp.stack_size], ckp.stack_data[0..ckp.stack_size]);

    // 3. Store register state for re-enter step
    fiber_saved_regs[fiber_id] = ckp.regs;
    fiber_saved_sepc[fiber_id] = ckp.sepc;

    // 4. Update SP from saved registers (SP = x2 in RISC-V)
    fiber_stack_pointers[fiber_id] = ckp.regs[2]; // x2 = sp

    // 5. If blob present, copy to blob buffer in fiber's address space
    //    Blob buffer: last 32KB of the fiber's 1MB region, before the stack
    //    Address: FIBER_BASE + fiber_id * FIBER_MEM_SIZE + BLOB_OFFSET
    //    BLOB_OFFSET = FIBER_MEM_SIZE - 0x1000 (stack) - BLOB_MAX = 0xF7000
    if (ckp.blob_size > 0) {
        const blob_dest = get_fiber_load_base(fiber_id) + BLOB_OFFSET;
        const dest: [*]u8 = @ptrFromInt(blob_dest);
        @memcpy(dest[0..ckp.blob_size], ckp.blob_data[0..ckp.blob_size]);
        fiber_blob_ptr[fiber_id] = blob_dest;
        fiber_blob_len[fiber_id] = ckp.blob_size;
    }
}

const BLOB_OFFSET: usize = FIBER_MEM_SIZE - 0x1000 - BLOB_MAX; // 0xF7000

Step 5: Re-enter (modified)

zig
fn reenter(fiber_id: u64) void {
    const restored = fiber_restored[fiber_id]; // Set by Step 4

    if (restored and fiber_blob_len[fiber_id] > 0) {
        // Warm restart with blob — use restore_entry from manifest
        const re = npk_registry[fiber_id].restore_entry;
        if (re != 0) {
            // New function: enters userland with a0=blob_ptr, a1=blob_len
            // instead of a0=systable. Same sret mechanics as hal_enter_userland.
            hal_enter_userland_restore(
                re,                             // entry = restore_entry
                fiber_blob_ptr[fiber_id],       // a0 = blob_ptr
                fiber_blob_len[fiber_id],       // a1 = blob_len
                fiber_stack_pointers[fiber_id], // sp
            );
            return;
        }
        // No restore_entry declared — discard blob, cold restart
    }

    if (restored) {
        // Warm restart without blob — resume from saved PC
        // Stack and SP already restored by apply_checkpoint
        hal_enter_userland(
            fiber_saved_sepc[fiber_id],
            SYSTABLE_BASE,
            fiber_stack_pointers[fiber_id],
        );
        return;
    }

    // Cold restart — normal entry point
    hal_enter_userland(
        fiber_entry_points[fiber_id],
        SYSTABLE_BASE,
        fiber_stack_pointers[fiber_id],
    );
}

hal_enter_userland_restore (new function in entry_riscv.zig)

Identical to hal_enter_userland except a0 and a1 carry blob pointer/length instead of SysTable address:

zig
export fn hal_enter_userland_restore(entry: u64, blob_ptr: u64, blob_len: u64, sp: u64) callconv(.c) void {
    var kstack: usize = 0;
    asm volatile ("mv %[kstack], sp" : [kstack] "=r" (kstack));
    asm volatile (
        \\ li t0, 0x20
        \\ csrs sstatus, t0
        \\ li t1, 0x100
        \\ csrc sstatus, t1
        \\ li t2, 0x40000
        \\ csrs sstatus, t2
        \\ csrw sepc, %[entry]
        \\ csrw sscratch, %[kstack]
        \\ mv sp, %[sp]
        \\ mv a0, %[blob_ptr]
        \\ mv a1, %[blob_len]
        \\ sret
        :
        : [entry] "r" (entry),
          [blob_ptr] "r" (blob_ptr),
          [blob_len] "r" (blob_len),
          [sp] "r" (sp),
          [kstack] "r" (kstack),
    );
}

Checkpoint Invalidation

A checkpoint becomes invalid when:

  1. The .npk image changes (firmware update). Old checkpoints reference code that no longer exists. respawn_set_npk() must clear the checkpoint region for that fiber.
  2. Manual invalidation via ground station command or L1 verdict ring.
  3. Dual-copy verification failure — both copies fail BLAKE3 → checkpoint is cleared to prevent future attempts.
  4. CSpace epoch delta exceeds threshold — if the fiber's CSpace has been revoked and re-granted many times since the checkpoint, the checkpoint's capability state may be too stale. Threshold: configurable, default 4 epoch bumps.
zig
pub fn invalidate_checkpoint(fiber_id: u64) void {
    // Zero the magic bytes of both copies — fast invalidation
    const copy_a: *CheckpointData = @ptrFromInt(checkpoint_addr(fiber_id, 0));
    const copy_b: *CheckpointData = @ptrFromInt(checkpoint_addr(fiber_id, 1));
    copy_a.magic = 0;
    copy_b.magic = 0;
    // reason codes: 0=hash_fail, 1=epoch_stale, 2=manual
    _ = emit_stl(65, fiber_id, 0, 0, fiber_id, 2, 0); // CheckpointReject, reason=manual
}

Performance Budget

OperationBudgetNotes
Save (kernel portion)< 200 usCopy 8KB stack + 32KB blob + BLAKE3 + dual write
Restore (kernel portion)< 100 usVerify BLAKE3 + copy stack + copy registers
Restore (fiber callback)Fiber's problemon_restore time is charged to the fiber's budget
BLAKE3 of 41KB~15 usZig std BLAKE3 on RISC-V @ 100MHz
Stack copy (8KB)~5 usmemcpy, cache-line aligned
Dual write (82KB)~50 usTwo sequential 41KB writes

Save is infrequent (fiber-initiated, application-defined cadence). Restore happens only after a panic — latency is acceptable.

Ontology Events

EventKindValueEmitterdata0data1data2
CheckpointWrite64L1 (save handler)XXH64 fingerprint of BLAKE3 hashblob_sizestack_size
CheckpointReject65L0 (restore) or L1 (invalidation)fiber_idreason (0=hash_fail, 1=epoch_stale, 2=manual)
CheckpointRegionInit66L0 (boot)region_base (0x8600_0000)region_size

CheckpointRegionInit is emitted once at boot by init_region(), proving the checkpoint memory was explicitly zeroed this boot cycle. Any checkpoint whose timestamp predates this event is provably stale.

NPK Registry Extension

The NpkEntry in respawn.zig gains a restore_entry field:

zig
pub const NpkEntry = struct {
    fiber_id: u64,
    image_offset: u64,
    image_size: u64,
    pubkey: [32]u8,
    path_hash: u64,
    restore_entry: u64,  // NEW: 0 = no restore callback; nonzero = on_restore address
};

Populated at boot time from the .npk manifest's restore-entry field. Default 0 (cold restart only).

Files

New Files

FilePurposeEst. LOCLayer
hal/checkpoint.zigCheckpoint storage, format, verify, apply, invalidate~300L0

Modified Files

FileChange
hal/respawn.zigIntegrate restore_from_checkpoint() into respawn flow (between Reload and Re-enter); modify reenter() for conditional entry point; remove CheckpointStatus stub (moved to checkpoint.zig)
hal/respawn.zigAdd restore_entry field to NpkEntry
hal/entry_riscv.zigRoute SYS_CHECKPOINT ecall to L1 handler
hal/abi.zigComptime import checkpoint.zig; FFI re-exports
core/kernel.nimAdd SYS_CHECKPOINT handler — capture registers, stack, blob, compute BLAKE3, dual-copy write, emit STL event
core/fiber.nimAdd checkpoint-related metadata (optional — or store in checkpoint.zig's own per-fiber table)

Testing Strategy

Unit Tests (checkpoint.zig, zig build test)

  • Checkpoint format: @sizeOf(CheckpointData) comptime assert
  • Write + verify round-trip: create checkpoint, compute BLAKE3, verify passes
  • Corrupted checkpoint: flip one bit → verify fails
  • Dual-copy: corrupt Copy A, verify Copy B succeeds
  • Both copies corrupted → NoCheckpoint result
  • Magic byte invalidation: invalidate_checkpoint() → verify returns false
  • Blob cap enforcement: blob_len > BLOB_MAX → rejected
  • checkpoint_addr() arithmetic: fiber 0 copy 0, fiber 15 copy 1 → correct addresses
  • Epoch staleness: checkpoint epoch vs current epoch delta > threshold → reject

Integration Tests

  • Full save → panic → restore cycle: fiber checkpoints, inject fault, verify warm restart (stack restored, registers restored)
  • Save with blob → restore with blob: verify on_restore entry called with correct blob pointer and length
  • Save without blob → restore: verify normal entry point (PC from checkpoint) used
  • Dual-copy corruption: corrupt Copy A after save, inject fault, verify Copy B used
  • Both copies corrupted: inject fault → verify cold restart from .npk
  • Checkpoint after firmware update: change .npk, inject fault → verify checkpoint invalidated, cold restart used
  • Phase 6.2: Restart Trap — cold restart pipeline (sterilize, verify, reload, re-enter)
  • Phase 6.3: Adaptive FDIR — anomaly detection that triggers respawn
  • SPEC-020: Capability Algebra — CSpace epoch semantics during restore
  • SPEC-060: System Ontology — CheckpointWrite (64), CheckpointReject (65) events
  • SPEC-085: BEB dual-copy pattern — mirrors the dual-copy + independent hash approach