Skip to content

Autonomous FDIR — Adaptive Dual-Layer Anomaly Detection

Phase 6.3

Fault Detection, Isolation, and Recovery (FDIR) with independent multi-layer verification. Two observers – L0 (Zig) and L1 (Nim) – reach conclusions independently, neither trusting the other. Maps to ECSS-E-ST-70-41C requirements for onboard autonomy.

Design Principles

  1. L0 verdicts are authoritative. If L0 says "quarantine fiber X" and L1 disagrees, L0 wins. Always. L1 can recommend actions via the verdict ring; L0 has veto power.
  2. Trust flows down. Hardware timer → L0 → L1 → userland. Information flows up, authority flows down.
  3. Zero cost when disabled. Every FDIR check gates on a cached feature mask. Disabled features cost one register comparison – not a function call, not a branch prediction miss.
  4. Boot profile is the floor. Runtime escalation can raise the paranoia level but never drop below the boot profile. A ground station cannot remotely disable your heartbeat monitor.
  5. Adaptive, not static. The system escalates its own FDIR profile based on observed conditions. A laptop that starts seeing ECC errors auto-escalates. A satellite in calm space doesn't waste cycles on full TMR voting.

Boot Profiles

Set in boot KDL. Determines the initial feature mask and the de-escalation floor.

kdl
survivability {
    profile "sovereign"
    // profile "hardened"
    // profile "radiation"
}

Feature Matrix

BitFeatureSovereignHardenedRadiation
0Restart Trap (trap.zig)ononon
1Circuit Breakerononon
2BEB Loggingononon
3L1 Heartbeat Monitoroffonon
4Baseline Learning (L1)offonon
5CSpace Integrity Auditoffoffon
6Panic Correlatoroffoffon
7Verdict Ringoffoffon
zig
pub const Profile = enum(u8) {
    sovereign = 0b0000_0111,
    hardened  = 0b0001_1111,
    radiation = 0b1111_1111,
};

// INVARIANT: Profiles must be strict supersets (numerically increasing).
// Each higher profile enables all bits of the lower profiles plus additional ones.
// This enables simple numeric comparison for escalation/de-escalation.
comptime {
    std.debug.assert(@intFromEnum(Profile.sovereign) < @intFromEnum(Profile.hardened));
    std.debug.assert(@intFromEnum(Profile.hardened) < @intFromEnum(Profile.radiation));
    // Every higher profile is a bitmask superset of the lower:
    const s = @intFromEnum(Profile.sovereign);
    const h = @intFromEnum(Profile.hardened);
    const r = @intFromEnum(Profile.radiation);
    std.debug.assert((h & s) == s); // hardened ⊇ sovereign
    std.debug.assert((r & h) == h); // radiation ⊇ hardened
}

pub var feature_mask: u8 = @intFromEnum(Profile.sovereign);
pub var boot_floor: u8 = @intFromEnum(Profile.sovereign);

/// De-escalation step mapping. Profiles form an ordered chain.
/// Returns the next lower profile, or the same profile if already at minimum.
pub fn deescalate_one_step(mask: u8) u8 {
    if (mask == @intFromEnum(Profile.radiation)) return @intFromEnum(Profile.hardened);
    if (mask == @intFromEnum(Profile.hardened)) return @intFromEnum(Profile.sovereign);
    return mask; // Already at sovereign or unknown — no change
}

The mask is read once per check path and cached in a register. Feature checks are single-instruction:

zig
inline fn feature_enabled(bit: u3) bool {
    return (feature_mask >> bit) & 1 != 0;
}

/// Export for L1 (Nim) to read the current feature mask for gating its own checks.
pub export fn anomaly_get_feature_mask() u8 {
    return feature_mask;
}

Configurable Thresholds

Profiles carry default thresholds; missions override them in boot KDL. Different hardware, different missions, different worlds, different settings.

kdl
survivability {
    profile "radiation"

    thresholds {
        heartbeat-warn-ns       500000000    // 500ms (default for radiation)
        heartbeat-soft-mult     5            // soft restart at 5× warn
        heartbeat-kexec-mult    20           // kexec at 20× warn
        correlation-window-ns   100000000    // 100ms panic correlation window
        correlation-threshold   3            // minimum distinct fibers
        audit-interval-ticks    1024         // CSpace audit every N heartbeats
        baseline-lock-samples   1000         // lock baseline after N samples
        deviation-sigma         3            // BehaviorViolation threshold (σ)
        deescalation-cooldown-s 300          // clean window before de-escalation
        hold-timeout-s          21600        // max FDIR_HOLD duration (6 hours)
    }
}

All thresholds have compile-time defaults per profile. KDL overrides are applied at anomaly_init() time. L0 stores them in a static FdirConfig struct – no heap, no parsing after boot. KDL values suffixed with -s (seconds) are multiplied by 1_000_000_000 during anomaly_init() to convert to the nanosecond representation used internally.

zig
pub const FdirConfig = struct {
    heartbeat_warn_ns: u64,
    heartbeat_soft_mult: u8,
    heartbeat_kexec_mult: u8,
    correlation_window_ns: u64,
    correlation_threshold: u8,
    audit_interval_ticks: u32,
    baseline_lock_samples: u32,
    deviation_sigma: u8,
    deescalation_cooldown_ns: u64,
    hold_timeout_ns: u64,
    l1_correlation_window_ns: u64,  // L1 cross-fiber correlation window (default 500ms)
};

Default configurations per profile:

ThresholdSovereignHardenedRadiation
heartbeat_warn_nsn/a1_000_000_000 (1s)500_000_000 (500ms)
heartbeat_soft_multn/a55
heartbeat_kexec_multn/a2020
correlation_window_nsn/an/a100_000_000 (100ms)
correlation_thresholdn/an/a3
audit_interval_ticksn/an/a1024 (~10s)
baseline_lock_samplesn/a10001000
deviation_sigman/a33
deescalation_cooldown_nsn/a300_000_000_000 (300s)300_000_000_000 (300s)
hold_timeout_nsn/an/a21_600_000_000_000 (6h)
l1_correlation_window_nsn/a500_000_000 (500ms)500_000_000 (500ms)

Architecture

┌─────────────────────────────────────────────────────────────┐
│  USERLAND                                                   │
│  ┌───────────────────────────────────────────┐              │
│  │ Membrane Fiber                            │              │
│  │ Consumes AnomalyDetected events           │              │
│  │ Relays telemetry to ground station        │              │
│  │ Forwards FDIR commands → ION → L1         │              │
│  └───────────────────┬───────────────────────┘              │
├──────────────────────┼──────────────────────────────────────┤
│  L1 KERNEL (Nim)     │                                      │
│  ┌───────────────────▼───────────────────────┐              │
│  │ Behavioral Analyst (baseline.nim)          │              │
│  │ Welford baselines per fiber               │              │
│  │ Cross-fiber correlation                   │              │
│  │ Writes recommendations → Verdict Ring     │              │
│  └───────────────────┬───────────────────────┘              │
│                      │ verdict ring (shared memory)         │
├──────────────────────┼──────────────────────────────────────┤
│  L0 HAL (Zig)        ▼                                      │
│  ┌──────────────────────────────────────────────────┐       │
│  │ Paranoid Watchdog (anomaly.zig)                   │       │
│  │                                                   │       │
│  │  ┌─────────────┐  ┌──────────────┐               │       │
│  │  │   Panic     │  │   CSpace     │               │       │
│  │  │ Correlator  │  │  Integrity   │               │       │
│  │  │ (bit 6)     │  │   Audit      │               │       │
│  │  └─────────────┘  │  (bit 5)     │               │       │
│  │                    └──────────────┘               │       │
│  │  ┌─────────────┐  ┌──────────────┐               │       │
│  │  │ Heartbeat   │  │   Verdict    │               │       │
│  │  │  Monitor    │  │    Ring      │               │       │
│  │  │ (bit 3)     │  │  Reader     │               │       │
│  │  └─────────────┘  │  (bit 7)     │               │       │
│  │                    └──────────────┘               │       │
│  │  ┌────────────────────────────────┐               │       │
│  │  │  Profile Manager               │               │       │
│  │  │  feature_mask + escalation     │               │       │
│  │  └────────────────────────────────┘               │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘

L0 Paranoid Watchdog (hal/anomaly.zig)

Temporal Panic Correlator (bit 6)

Detects correlated multi-fiber panics that indicate environmental events (radiation) rather than independent software bugs.

zig
const PanicRecord = struct {
    fiber_id: u64,
    timestamp_ns: u64,
};

pub const PanicCorrelator = struct {
    recent: [32]PanicRecord,
    head: u8,
    /// Number of valid entries in the ring. Capped at 32 (ring capacity).
    /// This is the walk depth, not a lifetime counter — it never overflows.
    count: u8,  // invariant: count <= 32
    /// Radiation suppression window end timestamp. While active,
    /// circuit breaker quarantine is overridden to Respawn.
    radiation_suppression_ns: u64,
};

Called from npl_trap_handler after every panic. Appends to the ring (wrapping at 32), caps count at 32. Walks the ring backward counting distinct fiber IDs within the correlation window.

Correlation vs. independent failure:

ObservationDiagnosisResponse
≥ N distinct fibers panic within windowCorrelated radiation eventSuppress individual quarantine, emit AnomalyDetected(80) with data0 = fiber_count, data1 = CORRELATION_RADIATION
Single fiber repeated panicsIndependent bugPer-fiber circuit breaker as usual

When a correlated event is detected, L0 sets a radiation_suppression_ns timestamp. For the next correlation window duration, the circuit breaker's quarantine verdict is overridden to Respawn – the fibers are fine, the environment is hostile. Individual panic logging to BEB continues unchanged.

CSpace Integrity Audit (bit 5)

Periodic walk of all MAX_FIBERS CSpaces × CSPACE_SIZE slots.

CheckDetects
Epoch monotonically non-decreasing between auditsEpoch counter corruption
No Null-typed cap with non-zero object_idGhost capabilities
No bounds where end < startBoundary inversion (bitflip)
No cap_type value outside the CapType enum range (0–6)Type field corruption

Triggered by heartbeat tick counter at the configurable audit_interval_ticks (default ~1024 ticks ≈ 10s).

On failure:

  1. Emit IntegrityAudit(83) with data0 = fiber_id, data1 = slot_index, data2 = check_type
  2. Log IntegrityFault(17) to BEB for the owning fiber
  3. Quarantine the owning fiber

Previous epoch values are stored in audit_epochs: [MAX_FIBERS]u32 and updated after each clean audit pass.

Epoch wrapping: CSpace revoke_all() uses epoch +%= 1 (wrapping add). At u32, this wraps after ~4 billion revocations – not a realistic concern in practice. The audit checks current_epoch != audit_epochs[fiber_id] (not >=) to detect any change. If the epoch changed but the CSpace is consistent, the audit passes and records the new epoch. An unexpected epoch change (no corresponding FiberRespawn event in the STL) is flagged as corruption.

L1 Heartbeat Monitor (bit 3)

Two distinct operations, two distinct call sites:

  1. L1 writes the heartbeat. L1's scheduler calls anomaly_heartbeat() every tick. This is an exported C ABI function that L1 invokes — it updates heartbeat_seq and last_heartbeat_ns in shared L0 memory. L1 is the writer; it proves it's alive by advancing the sequence.
  2. L0 checks the heartbeat. In entry_riscv.zig, the timer interrupt handler (intr_id == 5 path) calls anomaly_check_heartbeat() — a pure L0 function that reads the heartbeat state and compares against rumpk_timer_now_ns() (the canonical L0 time source). L0 is the reader; it independently verifies L1's liveness.

L0 checks both a timestamp and a sequence number independently.

zig
pub const HeartbeatMonitor = struct {
    last_heartbeat_ns: u64,
    heartbeat_seq: u64,
    last_checked_seq: u64,
    last_checked_ns: u64,
};

Why sequence number + timestamp (not just timestamp):

last_heartbeat_ns alone cannot distinguish "L1 is alive but the clock source is corrupted" from "L1 is alive and ticking normally." If the timer register gets hit by a bitflip and starts returning garbage timestamps, you see either a massive gap (false kexec) or no gap at all (false calm).

L1 increments heartbeat_seq every tick. L0 checks both:

  • Sequence must advance AND timestamp must be plausible
  • If sequence advances but timestamp is stuck or jumping → clock corruption event → recalibrate clock source, do NOT restart L1
  • If sequence is stuck AND timestamp is stuck → L1 hang → escalation ladder

Escalation Ladder:

GapMultiplierTime (radiation)Time (hardened)Action
Warn500ms1sEmit AnomalyDetected(80) with data0 = gap_ns, auto-escalate profile
Soft restartsoft_mult×2.5s5sL1 soft restart sequence (see below)
Kexeckexec_mult×10s20sFull hal_kexec from immutable kernel image (Phoenix Protocol). No negotiation.

Soft restart sequence (5 steps, all in L0):

  1. Quarantine all fibers. Walk breakers[0..MAX_FIBERS], set quarantined = true. No fiber can be scheduled.
  2. CSpace audit. Run cspace_audit_all(). If corruption found, log to BEB but continue — soft restart is the remediation.
  3. Sterilize all CSpaces. Call revoke_all() on every CSpace. Epoch bumps atomically invalidate all stale capabilities.
  4. Emit event. BehaviorViolation(82) with data0 = gap_ns, data1 = heartbeat_seq_delta.
  5. Re-enter L1. Call NimMain() then kmain(). The Nim runtime re-initializes, the scheduler restarts, fibers are respawned from their NPK images via the existing Phase 6.2 respawn engine. The verdict ring is zeroed. BEB and STL state are preserved.

If soft restart succeeds (L1 heartbeat resumes within 1× threshold after re-init), the system continues. If the heartbeat gap reaches kexec_mult×, the soft restart is considered failed and Phoenix Protocol fires.

Why these numbers:

  • 500ms warn gives 50 missed ticks at 10ms scheduler rate. No transient GCR event lasts that long. Fast enough that ground station telemetry barely notices before the system self-corrects.
  • 5× soft restart (2.5s) gives the system time to self-heal from transient deadlocks. A soft restart at 300ms would pre-empt a deadlock that would have resolved at 400ms.
  • 20× kexec (10s) confirms this isn't transient, isn't a deadlock, and soft restart failed. On a Mars rover, kexec kills the navigation fiber tracking position relative to a cliff edge – 10s confirms you're confident.

All multipliers and base thresholds are configurable per profile via KDL.

Verdict Ring (bit 7)

Shared memory ring where L1 recommends actions to L0. Single-writer (L1), single-reader (L0). No locks.

zig
pub const VerdictAction = enum(u8) {
    Noop = 0,
    Quarantine = 1,
    Demote = 2,
    Release = 3,
    Escalate = 4,        // L1 recommends profile escalation
    FdirHold = 5,         // Ground station: freeze current profile
    FdirRelease = 6,      // Ground station: resume auto-de-escalation
};

pub const VerdictEntry = struct {
    fiber_id: u64,
    action: VerdictAction,
    reason: u64,          // STL event_id that triggered recommendation
    timestamp_ns: u64,
};

pub const VerdictRing = struct {
    entries: [16]VerdictEntry,
    l1_head: u8,          // Written by L1 only (mod 16)
    l0_tail: u8,          // Read by L0 only (mod 16)
};

Memory ordering (RISC-V weak ordering): L1 must write the entry data before advancing l1_head. On RISC-V this requires a fence w,w (write-write barrier) between the entry store and the head update. L0 must read l1_head before reading entry data — fence r,r (read-read barrier). In Zig: @atomicStore(.release) for L1's head write, @atomicLoad(.acquire) for L0's head read. Entry data itself does not need atomics since it is fully written before the head advances.

L0 processes the ring during heartbeat checks:

  • If L0's own sensors agree with the recommendation → execute
  • If L0's sensors disagree → veto, emit BehaviorViolation(82) noting the disagreement
  • FdirHold and Escalate from ground station are honored unconditionally (ground station outranks L1)

Ground Station Override Channel

Three commands flow through UTCP → userland membrane → ION → L1 → verdict ring → L0:

CommandVerdictActionBehavior
FDIR_ESCALATE <profile>EscalateForce profile escalation. L0 honors unconditionally.
FDIR_HOLDFdirHoldFreeze current profile, prevent auto-de-escalation. Timeout after hold_timeout_ns (default 6h).
FDIR_RELEASEFdirReleaseResume automatic de-escalation. Respects floor constraint.

Ground station outranks L1 but cannot violate the boot floor. FDIR_ESCALATE sovereign when booted as hardened is a no-op.

Utility Functions

Time source: All timestamps in anomaly.zig use rumpk_timer_now_ns() — the same canonical time source used by the trap handler and BEB. Declared as extern fn rumpk_timer_now_ns() u64 in anomaly.zig (resolved at link time from entry_riscv.zig).

BEB window query:

zig
/// Count BEB entries whose timestamp_ns falls within [now - window_ns, now].
/// Walks the ring backward from head. Returns 0 if BEB is empty or TMR corrupt.
pub fn beb_entries_in_window(window_ns: u64) u32 {
    var head: u16 = 0;
    var count: u32 = 0;
    var chain: u64 = 0;
    if (!beb.tmr_read(&head, &count, &chain)) return 0;

    const now = rumpk_timer_now_ns();
    const cutoff = now -% window_ns;
    const depth = @min(count, beb.RING_CAPACITY);
    var idx: usize = (@as(usize, head) + beb.RING_CAPACITY - 1) % beb.RING_CAPACITY;
    var found: u32 = 0;

    var i: usize = 0;
    while (i < depth) : (i += 1) {
        const entry = &beb.beb_buf.ring[idx];
        if (entry.timestamp_ns < cutoff) break; // entries are chronological; stop early
        found += 1;
        idx = (idx + beb.RING_CAPACITY - 1) % beb.RING_CAPACITY;
    }
    return found;
}

Feature mask FFI for L1:

L1 (Nim) reads the feature mask via anomaly_get_feature_mask() (exported C ABI) to gate its own checks. The scheduler calls this once per tick and caches the result locally — not once per feature check.

Dynamic Escalation

Escalation (anomaly_escalate)

zig
/// Escalate the FDIR profile. Exported for L1 FFI (verdict ring path).
/// Internal L0 callers (panic correlator, heartbeat monitor) call this
/// directly via Zig's normal call convention — the `export` keyword
/// only affects the C ABI symbol generation, not the internal call path.
pub export fn anomaly_escalate(new_profile: u8) void {
    // Escalation is one atomic write
    // New profile must be > current mask (strict superset, numerically greater)
    // New profile must be >= boot floor
    if (new_profile > feature_mask and new_profile >= boot_floor) {
        const old = feature_mask;
        feature_mask = new_profile;
        last_escalation_ns = rumpk_timer_now_ns();
        _ = emit_stl(80, 0, 0, 0, new_profile, old, 0);
    }
}

Automatic escalation triggers:

TriggerSourceTarget Profile
Panic correlator fires (≥N fibers in window)L0radiation
L1 heartbeat gap > 1× thresholdL0hardened
3+ ECC corrections in 60s (from HAL memory scrubber)L0radiation
10+ BEB entries with same panic_class in 60s windowL0hardened
L1 cross-fiber correlation (≥2 fibers deviating)L1 via verdict ringhardened

De-escalation

One step at a time (radiation → hardened → sovereign). Requires ALL of:

  1. Cooldown period: deescalation_cooldown_ns (default 300s) since last escalation or anomaly event
  2. CSpace audit pass: Clean walk with zero integrity faults
  3. Zero BEB entries in the cooldown window
  4. No active FDIR_HOLD
  5. Target >= boot floor (non-negotiable)
zig
pub fn anomaly_try_deescalate() void {
    if (fdir_hold_active) return;
    const now = get_time_ns();
    if (now -% last_escalation_ns < config.deescalation_cooldown_ns) return;
    if (beb_entries_in_window(config.deescalation_cooldown_ns) > 0) return;
    if (!last_cspace_audit_clean) return;

    const target = deescalate_one_step(feature_mask);
    if (target >= boot_floor) {
        feature_mask = target;
        _ = emit_stl(81, 0, 0, 0, target, 0, 0); // BehaviorBaseline = new stable state
    }
}

L1 Behavioral Analyst (core/baseline.nim)

Baseline Learning (Welford's Online Algorithm)

Per-fiber running statistics – O(1) per sample, no buffering, no batch computation:

nim
type BehaviorBaseline* = object
    burst_mean*: float64       # Running mean of burst_ns
    burst_m2*: float64         # Running M2 for variance (Welford)
    syscall_rate*: float64     # Syscalls per scheduling quantum
    channel_ratio*: float64    # send / (send + recv) ratio
    sample_count*: uint32      # Total samples collected
    locked*: bool              # Baseline solidified after N samples

Updated every sched_analyze_burst() call – already in the scheduler hot path:

nim
proc baseline_update*(b: var BehaviorBaseline, burst_ns: uint64) =
    b.sample_count += 1
    let n = b.sample_count.float64
    let x = burst_ns.float64
    let delta = x - b.burst_mean
    b.burst_mean += delta / n
    let delta2 = x - b.burst_mean
    b.burst_m2 += delta * delta2

    if b.sample_count >= config.baseline_lock_samples:
        b.locked = true

After locked = true:

  • σ = sqrt(M2 / (sample_count - 1))
  • If |current_burst - mean| > Nσ (configurable, default 3σ) → emit BehaviorViolation(82)
  • N is configurable via deviation_sigma in KDL

Cross-Fiber Correlation

nim
var anomaly_scores: array[MAX_FIBERS, float64]
var anomaly_window_start_ns: uint64
var anomalous_fiber_count: uint32
  • Each BehaviorViolation increments the fiber's anomaly score
  • If ≥2 fibers exceed threshold within l1_correlation_window_ns (configurable, default 500ms) → systemic event
  • L1 writes recommendation to verdict ring: Escalate with reason = STL event_id
  • L0 independently validates via its own panic correlator
  • Two independent observers reaching the same conclusion = high-confidence anomaly

BEB Preservation Guarantee

The BEB lives in a fixed physical memory region. hal_kexec loads a new kernel via the L0 ELF loader. The ELF loader must refuse to map any PT_LOAD segment that overlaps the BEB region.

In elf_loader.zig, add a BEB exclusion check:

zig
// Before memcpy in the load loop:
if (ranges_overlap(dest_addr, dest_addr + p_memsz, BEB_BASE, BEB_BASE + BEB_SIZE)) {
    return ElfError.SegmentOverlap; // Refuse to overwrite BEB
}

This requires adding SegmentOverlap to the ElfError enum in elf_loader.zig.

The BEB base address is known at compile time. This is one branch per PT_LOAD segment – zero cost in practice since kernels have 2–4 segments.

Additionally, anomaly_init() must verify the BEB is intact after a kexec by checking the TMR header magic bytes. If the BEB survived, the chain is walked to establish the prev_hash for continued logging. If the BEB is corrupted, it is re-initialized (accepting the forensic data loss) and a BebOverflow(67) event is emitted to signal the discontinuity.

Ontology Extensions

All event kinds were pre-wired in Phase 6.2. No new EventKind values needed.

EventKindValueEmitterdata0data1data2
AnomalyDetected80L0context-dependentcontext-dependent
BehaviorBaseline81L0/L1new_profile (de-escalation)
BehaviorViolation82L0/L1fiber_id or gap_nsdeviation_score
IntegrityAudit83L0fiber_idslot_indexcheck_type

AnomalyDetected(80) context by source:

Sourcedata0data1
Panic correlatorfiber_countCORRELATION_RADIATION (1)
Heartbeat warngap_nsheartbeat_seq delta
ECC escalationecc_countwindow_ns
Profile escalationnew_profileold_profile

Files

New Files

FilePurposeEst. LOCLayer
hal/anomaly.zigParanoid Watchdog – correlator, audit, heartbeat, verdict ring, profile manager, escalation~400L0
core/baseline.nimBehavioral Analyst – Welford baselines, cross-fiber correlation, verdict ring writer~200L1

Modified Files

FileChange
hal/trap.zigCall anomaly_record_panic() after verdict (gated on bit 6)
hal/entry_riscv.zigHeartbeat check in timer interrupt path (gated on bit 3)
hal/abi.zigComptime import anomaly.zig + FFI re-exports for anomaly_heartbeat, anomaly_escalate, anomaly_init, anomaly_get_feature_mask
hal/elf_loader.zigAdd SegmentOverlap to ElfError enum; BEB exclusion range check in PT_LOAD loop
hal/cspace.zigAdd cspace_audit_all() function for L0 integrity walk
core/sched.nimCall baseline_update() in sched_analyze_burst() (gated on bit 4); call anomaly_heartbeat() every tick (gated on bit 3)
core/fiber.nimAdd baseline: BehaviorBaseline field to FiberObject

Testing Strategy

Unit Tests (per-file, zig build test)

anomaly.zig:

  • Panic correlator: 2 fibers in window → no trigger; 3 fibers → trigger; fibers outside window → no trigger
  • CSpace audit: clean CSpace → pass; corrupted epoch → fail with correct fiber_id; ghost cap → fail
  • Heartbeat: sequence advances with plausible timestamp → alive; sequence stuck → hang detected; sequence advances but timestamp stuck → clock corruption (different from hang)
  • Verdict ring: L1 writes, L0 reads in order; ring wrap; L0 veto on disagreement
  • Profile escalation: escalate above current → success; escalate below floor → no-op; de-escalation with dirty BEB → blocked; de-escalation with clean window → success
  • Feature mask gating: disabled feature check returns false; enabled returns true
  • Config: default thresholds per profile are correct; KDL override applies

baseline.nim:

  • Welford: known sequence → correct mean and variance; locked after N samples; deviation detection at 3σ
  • Cross-fiber: single fiber anomaly → no systemic flag; 2+ fibers → systemic; verdict ring write

Integration Tests

  • Full panic → correlator → escalation: inject 3 page faults from 3 different fibers within 100ms → verify AnomalyDetected(80) → verify profile escalated → verify radiation suppression active
  • Heartbeat gap → soft restart: stop heartbeat ticks → verify warn at 1× → verify soft restart at 5×
  • CSpace corruption → quarantine: flip a bit in a capability → verify IntegrityAudit(83) → verify fiber quarantined
  • Ground station override: inject FDIR_HOLD → verify no de-escalation during hold → inject FDIR_RELEASE → verify de-escalation resumes
  • De-escalation: escalate to radiation → wait cooldown → clean audit → verify step-down to hardened
  • BEB preservation: verify elf_loader rejects PT_LOAD overlapping BEB region
  • Phase 6.2: Restart Trap – trap.zig, beb.zig, respawn.zig, elf_loader.zig
  • SPEC-020: Capability Algebra – CSpace integrity model
  • SPEC-060: System Ontology – STL event framework (EventKind 80–83)
  • RFC-0110: Membrane Agent – telemetry relay (consumer of anomaly events)
  • RFC-0649: EPOE – System survival through restart
  • ECSS-E-ST-70-41C: Onboard autonomy requirements for FDIR