Checkpoint Warm Restart

Phase 6.4

Cold restart (Phase 6.2) reloads a fiber from its immutable .npk image — it starts from main(). Warm restart resumes from a saved snapshot — registers, call chain, and optionally domain state. The fiber picks up where it left off.

Design Principles

The kernel saves the hard parts. Registers, stack, CSpace epoch — mechanical, deterministic, every fiber gets this for free. No fiber author involvement required.
The fiber saves the domain parts. A navigation error accumulator, a partial spectrometer dataset, a protocol state machine — only the fiber knows what matters. It serializes its own state via a syscall.
Save trigger is a syscall, not a timer. The fiber calls sys_checkpoint() at application-defined safe points. A fiber knows when its state is consistent better than any kernel heuristic. Timer-driven checkpoints risk snapshotting half-updated structures.
Dual-copy with independent BLAKE3 hashes. A single corrupted checkpoint is worse than no checkpoint — you restore into corruption that doesn't crash, just silently misbehaves. Two copies, each verified. Both fail → cold restart. (BLAKE3 instead of XXH64 because checkpoint corruption leads to silent misbehavior in restored state — a stronger hash is warranted. The BEB uses XXH64 because its threat model is bitflip detection, not prevention of silent restore into corrupted state.)
32KB hard cap on application blob. Non-negotiable. A runaway fiber cannot blow the memory budget. The kernel returns CHECKPOINT_TOO_LARGE; the fiber decides what to cut.

Architecture

Fiber calls sys_checkpoint(blob_ptr, blob_len)
    ↓
L1 captures trap frame (registers) + copies stack + copies blob
    ↓
L1 computes BLAKE3 over entire checkpoint
    ↓
L1 writes to Copy A, then Copy B (dual-copy)
    ↓
L1 emits CheckpointWrite(64) STL event
    ↓
Returns to fiber — zero downtime

            ─── later, on panic ───

respawn_fiber() called
    ↓
Sterilize (CSpace revoke, TLB flush)
    ↓
Verify .npk signature (Ed25519)
    ↓
Reload ELF from immutable image (fresh code)
    ↓
check_checkpoint(fiber_id)
    ├─ Both copies fail BLAKE3 → CheckpointReject(65), cold restart
    ├─ No checkpoint exists → cold restart
    └─ Valid checkpoint found:
        ↓
        Restore stack from checkpoint
        Restore registers from checkpoint
        ↓
        blob_size > 0?
        ├─ No  → re-enter at normal entry point (registers set PC)
        └─ Yes → re-enter at restore_entry with (blob_ptr, blob_len) in a0/a1

Key insight: Warm restart still reloads the ELF. The code is always verified fresh from the immutable image. What the checkpoint saves is the execution context (where was I, what was on my stack) and domain state (what application data did I need). Code integrity is never compromised.

Checkpoint Format

Per-Fiber Checkpoint Blob

Fixed-size structure. Two independent copies per fiber, each with its own BLAKE3 hash.

zig

pub const CHECKPOINT_MAGIC: u32 = 0x434B5021; // "CKP!"
pub const STACK_MAX: usize = 8192;            // 8KB stack snapshot (current fiber stack is 4KB/0x1000;
                                               //   8KB is forward-looking for expanded stacks in later phases.
                                               //   stack_size field records actual bytes captured, <= STACK_MAX)
pub const BLOB_MAX: usize = 32768;            // 32KB application blob cap
pub const REGS_COUNT: usize = 32;             // 32 GPRs (RISC-V)

pub const CheckpointData = extern struct {
    // Header (24 bytes)
    magic: u32,             // "CKP!" — validity marker
    fiber_id: u16,          // Owning fiber (truncated from u64; assert fiber_id < MAX_FIBERS before cast)
    _pad_hdr: u16,          // Alignment
    epoch: u32,             // CSpace epoch at checkpoint time
    _pad_epoch: u32,        // Alignment
    timestamp_ns: u64,      // When the checkpoint was created

    // Register state (272 bytes)
    regs: [REGS_COUNT]u64,  // x0–x31 (x0 is always zero but stored for simplicity)
    sepc: u64,              // Program counter at checkpoint
    sstatus: u64,           // Saved status register

    // Stack snapshot (8196 bytes)
    stack_size: u32,        // Actual stack bytes captured (<= STACK_MAX)
    stack_data: [STACK_MAX]u8,

    // Application blob (32772 bytes)
    blob_size: u32,         // 0 if no application blob; <= BLOB_MAX
    blob_data: [BLOB_MAX]u8,

    // Integrity (32 bytes)
    hash: [32]u8,           // BLAKE3 of all bytes before this field
};

Size per checkpoint: 24 + 272 + 8196 + 32772 + 32 = 41,296 bytes (~40.3 KB).

Memory budget: 16 fibers x 2 copies x 41,296 bytes = 1,321,472 bytes (~1.26 MB).

Storage Region

Dedicated compile-time known physical memory region. NOT carved from the fiber's 1MB user space — checkpoint memory is kernel-only, inaccessible to userland.

zig

pub const CHECKPOINT_REGION_BASE: usize = 0x8600_0000; // Below FIBER_BASE (0x8800_0000)
pub const CHECKPOINT_COPY_SIZE: usize = @sizeOf(CheckpointData);
pub const CHECKPOINT_REGION_SIZE: usize = MAX_FIBERS * 2 * CHECKPOINT_COPY_SIZE;
// ~1.26 MB total

/// Get pointer to checkpoint copy for a fiber.
/// copy_idx: 0 = Copy A, 1 = Copy B
pub fn checkpoint_addr(fiber_id: u64, copy_idx: u1) usize {
    return CHECKPOINT_REGION_BASE
        + @as(usize, fiber_id) * 2 * CHECKPOINT_COPY_SIZE
        + @as(usize, copy_idx) * CHECKPOINT_COPY_SIZE;
}

Memory map (updated):

0x8000_0000  SysTable
0x8600_0000  Checkpoint Region (1.26 MB, kernel-only)
             ├─ Fiber 0 Copy A (41 KB)
             ├─ Fiber 0 Copy B (41 KB)
             ├─ Fiber 1 Copy A (41 KB)
             ├─ Fiber 1 Copy B (41 KB)
             └─ ... (16 fibers)
0x8615_0000  (end of checkpoint region, page-aligned up from 0x8614_2A00)
0x8800_0000  Fiber user memory (16 x 1 MB)

Syscall Interface

sys_checkpoint — Save

Syscall number: SYS_CHECKPOINT = 0x500  (survivability range; 0x200 is already VFS OPEN)
Arguments:
    a0 = blob_ptr ([*]const u8 or 0 for no blob)
    a1 = blob_len (u32, must be <= 32768; 0 for no blob)
Returns:
    a0 = 0 on success
    a0 = CHECKPOINT_TOO_LARGE (1) if blob_len > BLOB_MAX
    a0 = CHECKPOINT_DISABLED (2) if fiber has no checkpoint capability

Routing: L1 handler (Option A). The ecall follows the existing path: trap_entry → rss_trap_handler → k_handle_syscall in L1 (Nim). The SYS_CHECKPOINT case is dispatched within k_handle_syscall alongside existing syscalls. The fiber is actively running when it calls sys_checkpoint(), so L1 is alive by definition. L1 calls down to checkpoint.zig for BLAKE3 computation and dual-copy writes via exported C ABI functions. The L1 handler:

Captures the trap frame registers (already saved by trap_entry in entry_riscv.zig)
Copies the fiber's stack from its user-mode stack region (with SUM access)
If blob_ptr != 0: copies blob_len bytes from fiber's address space
Records the CSpace epoch from cspace_get(fiber_id).epoch
Computes BLAKE3 over the entire CheckpointData (excluding the hash field)
Writes Copy A, then Copy B
Emits CheckpointWrite(64) with data0 = xxh64_of_hash (quick fingerprint for telemetry)
Returns to fiber via normal ecall return path (sepc += 4)

Restore Entry Declaration

Fibers that use application blobs declare a restore entry point in their .npk manifest (BKDL section):

kdl

manifest {
    name "nav-fiber"
    restore-entry 0x80001234    // Called instead of entry on warm restart
}

If restore-entry is not declared and a checkpoint with blob_size > 0 exists, the kernel falls back to cold restart (the blob is discarded). This is a safety measure — the fiber must explicitly opt in to blob restore.

on_restore Callback Convention

When the kernel calls restore_entry instead of the normal entry point:

a0 = blob_ptr — pointer to a kernel-provided read-only buffer containing the blob
a1 = blob_len — size of the blob

The fiber rebuilds its domain state from the blob, then resumes normal operation. The stack is already restored (call chain intact, local variables present). The fiber's registers are restored. The fiber's sepc points to the instruction after the sys_checkpoint() ecall — so if the fiber simply returns from on_restore, it continues execution from the point of the last checkpoint.

Simplified pattern for fiber authors:

// In the fiber's main loop:
while (true) {
    sensor_data = read_sensor();
    accumulator += process(sensor_data);

    if (iteration % 1000 == 0) {
        // Checkpoint: save accumulator state
        struct { float64 acc; uint32 iter; } state = { accumulator, iteration };
        sys_checkpoint(&state, sizeof(state));
    }
}

// Declared in manifest: restore-entry = on_restore
void on_restore(uint8_t* blob, uint32_t len) {
    // Rebuild from blob
    accumulator = ((state_t*)blob)->acc;
    iteration = ((state_t*)blob)->iter;
    // Stack is already restored — return continues the main loop
}

Restore Mechanics

In respawn_fiber() — Modified Flow

The existing 4-step respawn sequence becomes 5 steps:

Step 1: Sterilize     (unchanged — CSpace revoke, TLB flush)
Step 2: Verify         (unchanged — Ed25519 on .npk)
Step 3: Reload         (unchanged — fresh ELF from .npk)
Step 4: Restore        (NEW — checkpoint restore if available)
Step 5: Re-enter       (modified — conditional entry point)

Step 4: Restore (new)

zig

fn restore_from_checkpoint(fiber_id: u64) RestoreResult {
    // 1. Try Copy A
    const copy_a = checkpoint_ptr(fiber_id, 0);
    if (verify_checkpoint(copy_a, fiber_id)) {
        apply_checkpoint(fiber_id, copy_a);
        return .Restored;
    }

    // 2. Try Copy B
    const copy_b = checkpoint_ptr(fiber_id, 1);
    if (verify_checkpoint(copy_b, fiber_id)) {
        apply_checkpoint(fiber_id, copy_b);
        return .Restored;
    }

    // 3. Both failed
    _ = emit_stl(65, fiber_id, 0, 0, fiber_id, 0, 0); // CheckpointReject, reason=0 (hash_fail)
    invalidate_checkpoint(fiber_id); // Clear invalid data
    return .NoCheckpoint; // Fall back to cold restart
}

pub const RestoreResult = enum {
    Restored,       // Checkpoint applied successfully
    NoCheckpoint,   // No valid checkpoint — cold restart
};

verify_checkpoint

zig

fn verify_checkpoint(ckp: *const CheckpointData, fiber_id: u64) bool {
    if (ckp.magic != CHECKPOINT_MAGIC) return false;

    // Compute BLAKE3 of everything before the hash field
    const data_size = @offsetOf(CheckpointData, "hash");
    const data_ptr: [*]const u8 = @ptrCast(ckp);

    var hasher = std.crypto.hash.Blake3.init(.{});
    hasher.update(data_ptr[0..data_size]);
    var computed: [32]u8 = undefined;
    hasher.final(&computed);

    if (!std.mem.eql(u8, &computed, &ckp.hash)) return false;

    // Epoch staleness check: if CSpace epoch has advanced too far since checkpoint,
    // the capability state is too stale to trust. Threshold: 4 epoch bumps.
    const cspace = @import("cspace.zig");
    if (cspace.cspace_get(fiber_id)) |cs| {
        const delta = cs.epoch -% ckp.epoch;
        if (delta > EPOCH_STALENESS_THRESHOLD) return false;
    }

    return true;
}

pub const EPOCH_STALENESS_THRESHOLD: u32 = 4;

apply_checkpoint

zig

fn apply_checkpoint(fiber_id: u64, ckp: *const CheckpointData) void {
    // 1. Verify CSpace epoch is compatible
    //    If checkpoint epoch != current epoch, capabilities may have changed.
    //    For safety: re-grant from manifest (already done in Reload step).

    // 2. Restore stack to fiber's stack region
    const stack_base = get_fiber_load_base(fiber_id) + FIBER_STACK_OFFSET - ckp.stack_size;
    const dest: [*]u8 = @ptrFromInt(stack_base);
    @memcpy(dest[0..ckp.stack_size], ckp.stack_data[0..ckp.stack_size]);

    // 3. Store register state for re-enter step
    fiber_saved_regs[fiber_id] = ckp.regs;
    fiber_saved_sepc[fiber_id] = ckp.sepc;

    // 4. Update SP from saved registers (SP = x2 in RISC-V)
    fiber_stack_pointers[fiber_id] = ckp.regs[2]; // x2 = sp

    // 5. If blob present, copy to blob buffer in fiber's address space
    //    Blob buffer: last 32KB of the fiber's 1MB region, before the stack
    //    Address: FIBER_BASE + fiber_id * FIBER_MEM_SIZE + BLOB_OFFSET
    //    BLOB_OFFSET = FIBER_MEM_SIZE - 0x1000 (stack) - BLOB_MAX = 0xF7000
    if (ckp.blob_size > 0) {
        const blob_dest = get_fiber_load_base(fiber_id) + BLOB_OFFSET;
        const dest: [*]u8 = @ptrFromInt(blob_dest);
        @memcpy(dest[0..ckp.blob_size], ckp.blob_data[0..ckp.blob_size]);
        fiber_blob_ptr[fiber_id] = blob_dest;
        fiber_blob_len[fiber_id] = ckp.blob_size;
    }
}

const BLOB_OFFSET: usize = FIBER_MEM_SIZE - 0x1000 - BLOB_MAX; // 0xF7000

Step 5: Re-enter (modified)

zig

fn reenter(fiber_id: u64) void {
    const restored = fiber_restored[fiber_id]; // Set by Step 4

    if (restored and fiber_blob_len[fiber_id] > 0) {
        // Warm restart with blob — use restore_entry from manifest
        const re = npk_registry[fiber_id].restore_entry;
        if (re != 0) {
            // New function: enters userland with a0=blob_ptr, a1=blob_len
            // instead of a0=systable. Same sret mechanics as hal_enter_userland.
            hal_enter_userland_restore(
                re,                             // entry = restore_entry
                fiber_blob_ptr[fiber_id],       // a0 = blob_ptr
                fiber_blob_len[fiber_id],       // a1 = blob_len
                fiber_stack_pointers[fiber_id], // sp
            );
            return;
        }
        // No restore_entry declared — discard blob, cold restart
    }

    if (restored) {
        // Warm restart without blob — resume from saved PC
        // Stack and SP already restored by apply_checkpoint
        hal_enter_userland(
            fiber_saved_sepc[fiber_id],
            SYSTABLE_BASE,
            fiber_stack_pointers[fiber_id],
        );
        return;
    }

    // Cold restart — normal entry point
    hal_enter_userland(
        fiber_entry_points[fiber_id],
        SYSTABLE_BASE,
        fiber_stack_pointers[fiber_id],
    );
}

hal_enter_userland_restore (new function in entry_riscv.zig)

Identical to hal_enter_userland except a0 and a1 carry blob pointer/length instead of SysTable address:

zig

export fn hal_enter_userland_restore(entry: u64, blob_ptr: u64, blob_len: u64, sp: u64) callconv(.c) void {
    var kstack: usize = 0;
    asm volatile ("mv %[kstack], sp" : [kstack] "=r" (kstack));
    asm volatile (
        \\ li t0, 0x20
        \\ csrs sstatus, t0
        \\ li t1, 0x100
        \\ csrc sstatus, t1
        \\ li t2, 0x40000
        \\ csrs sstatus, t2
        \\ csrw sepc, %[entry]
        \\ csrw sscratch, %[kstack]
        \\ mv sp, %[sp]
        \\ mv a0, %[blob_ptr]
        \\ mv a1, %[blob_len]
        \\ sret
        :
        : [entry] "r" (entry),
          [blob_ptr] "r" (blob_ptr),
          [blob_len] "r" (blob_len),
          [sp] "r" (sp),
          [kstack] "r" (kstack),
    );
}

Checkpoint Invalidation

A checkpoint becomes invalid when:

The .npk image changes (firmware update). Old checkpoints reference code that no longer exists. respawn_set_npk() must clear the checkpoint region for that fiber.
Manual invalidation via ground station command or L1 verdict ring.
Dual-copy verification failure — both copies fail BLAKE3 → checkpoint is cleared to prevent future attempts.
CSpace epoch delta exceeds threshold — if the fiber's CSpace has been revoked and re-granted many times since the checkpoint, the checkpoint's capability state may be too stale. Threshold: configurable, default 4 epoch bumps.

zig

pub fn invalidate_checkpoint(fiber_id: u64) void {
    // Zero the magic bytes of both copies — fast invalidation
    const copy_a: *CheckpointData = @ptrFromInt(checkpoint_addr(fiber_id, 0));
    const copy_b: *CheckpointData = @ptrFromInt(checkpoint_addr(fiber_id, 1));
    copy_a.magic = 0;
    copy_b.magic = 0;
    // reason codes: 0=hash_fail, 1=epoch_stale, 2=manual
    _ = emit_stl(65, fiber_id, 0, 0, fiber_id, 2, 0); // CheckpointReject, reason=manual
}

Performance Budget

Operation	Budget	Notes
Save (kernel portion)	< 200 us	Copy 8KB stack + 32KB blob + BLAKE3 + dual write
Restore (kernel portion)	< 100 us	Verify BLAKE3 + copy stack + copy registers
Restore (fiber callback)	Fiber's problem	`on_restore` time is charged to the fiber's budget
BLAKE3 of 41KB	~15 us	Zig std BLAKE3 on RISC-V @ 100MHz
Stack copy (8KB)	~5 us	memcpy, cache-line aligned
Dual write (82KB)	~50 us	Two sequential 41KB writes

Save is infrequent (fiber-initiated, application-defined cadence). Restore happens only after a panic — latency is acceptable.

Ontology Events

EventKind	Value	Emitter	data0	data1	data2
`CheckpointWrite`	64	L1 (save handler)	XXH64 fingerprint of BLAKE3 hash	blob_size	stack_size
`CheckpointReject`	65	L0 (restore) or L1 (invalidation)	fiber_id	reason (0=hash_fail, 1=epoch_stale, 2=manual)	—
`CheckpointRegionInit`	66	L0 (boot)	region_base (0x8600_0000)	region_size	—

CheckpointRegionInit is emitted once at boot by init_region(), proving the checkpoint memory was explicitly zeroed this boot cycle. Any checkpoint whose timestamp predates this event is provably stale.

NPK Registry Extension

The NpkEntry in respawn.zig gains a restore_entry field:

zig

pub const NpkEntry = struct {
    fiber_id: u64,
    image_offset: u64,
    image_size: u64,
    pubkey: [32]u8,
    path_hash: u64,
    restore_entry: u64,  // NEW: 0 = no restore callback; nonzero = on_restore address
};

Populated at boot time from the .npk manifest's restore-entry field. Default 0 (cold restart only).

Files

New Files

File	Purpose	Est. LOC	Layer
`hal/checkpoint.zig`	Checkpoint storage, format, verify, apply, invalidate	~300	L0

Modified Files

File	Change
`hal/respawn.zig`	Integrate `restore_from_checkpoint()` into respawn flow (between Reload and Re-enter); modify `reenter()` for conditional entry point; remove `CheckpointStatus` stub (moved to checkpoint.zig)
`hal/respawn.zig`	Add `restore_entry` field to `NpkEntry`
`hal/entry_riscv.zig`	Route `SYS_CHECKPOINT` ecall to L1 handler
`hal/abi.zig`	Comptime import checkpoint.zig; FFI re-exports
`core/kernel.nim`	Add `SYS_CHECKPOINT` handler — capture registers, stack, blob, compute BLAKE3, dual-copy write, emit STL event
`core/fiber.nim`	Add checkpoint-related metadata (optional — or store in checkpoint.zig's own per-fiber table)

Testing Strategy

Unit Tests (checkpoint.zig, `zig build test`)

Checkpoint format: @sizeOf(CheckpointData) comptime assert
Write + verify round-trip: create checkpoint, compute BLAKE3, verify passes
Corrupted checkpoint: flip one bit → verify fails
Dual-copy: corrupt Copy A, verify Copy B succeeds
Both copies corrupted → NoCheckpoint result
Magic byte invalidation: invalidate_checkpoint() → verify returns false
Blob cap enforcement: blob_len > BLOB_MAX → rejected
checkpoint_addr() arithmetic: fiber 0 copy 0, fiber 15 copy 1 → correct addresses
Epoch staleness: checkpoint epoch vs current epoch delta > threshold → reject

Integration Tests

Full save → panic → restore cycle: fiber checkpoints, inject fault, verify warm restart (stack restored, registers restored)
Save with blob → restore with blob: verify on_restore entry called with correct blob pointer and length
Save without blob → restore: verify normal entry point (PC from checkpoint) used
Dual-copy corruption: corrupt Copy A after save, inject fault, verify Copy B used
Both copies corrupted: inject fault → verify cold restart from .npk
Checkpoint after firmware update: change .npk, inject fault → verify checkpoint invalidated, cold restart used

Phase 6.2: Restart Trap — cold restart pipeline (sterilize, verify, reload, re-enter)
Phase 6.3: Adaptive FDIR — anomaly detection that triggers respawn
SPEC-020: Capability Algebra — CSpace epoch semantics during restore
SPEC-060: System Ontology — CheckpointWrite (64), CheckpointReject (65) events
SPEC-085: BEB dual-copy pattern — mirrors the dual-copy + independent hash approach

Checkpoint Warm Restart ​

Design Principles ​

Architecture ​

Checkpoint Format ​

Per-Fiber Checkpoint Blob ​

Storage Region ​

Syscall Interface ​

sys_checkpoint — Save ​

Restore Entry Declaration ​

on_restore Callback Convention ​

Restore Mechanics ​

In respawn_fiber() — Modified Flow ​

Step 4: Restore (new) ​

verify_checkpoint ​

apply_checkpoint ​

Step 5: Re-enter (modified) ​

hal_enter_userland_restore (new function in entry_riscv.zig) ​

Checkpoint Invalidation ​

Performance Budget ​

Ontology Events ​

NPK Registry Extension ​

Files ​

New Files ​

Modified Files ​

Testing Strategy ​

Unit Tests (checkpoint.zig, zig build test) ​

Integration Tests ​

Related Specifications ​