Skip to content

NexFS

v1.3.0 — 375+ Tests Passing

NexFS is the Nexus sovereign filesystem — a flash-native, graph-structured storage layer implemented in Zig with zero dynamic allocation. It runs on everything from bare-metal microcontrollers with 64 KB RAM to multi-hundred-terabyte Homenodes. No libc. No OS required.

Objects exist in a directed graph — location is a relationship, not an identity.

Address Space (v4)

NexFS v4 uses u64 block addressing throughout. At 4 KB block size, a single volume addresses 64 ZB — enough for a Homenode with 352 TB of mixed NVMe + SAS storage, a 1,000-node Chapter at 352 PB, or planetwide mesh at exabyte-to-zettabyte scale over a 15+ year horizon.

FieldWidthCeiling @ 4 KB blocks
BlockAddru6464 ZB per volume
block_count (Superblock)u6464 ZB
bucket_id (BamEntry)u324B buckets (~8 EB at Hydra bucket size)
data_count (Allocator)u6464 ZB
Inode.block_countu6464 ZB per file
InodeIdu324B inodes per volume (cross-node uses CAS CIDs)
Inode.sizeu6416 EB per file
device_sizeu6416 EB

BamEntry overhead: ~0.4% per TB. Negligible. Superblock at 256 bytes still fits in a single 512-byte minimum block with room for future mesh and sovereignty fields.

Architecture

NexFS is built from 18 composable modules. Each module is a self-contained compilation unit with no hidden state:

superblock ─── inode ─── dir ─── dir_ops ─── path ─── graph
    │              │                │
  format        file ── cow      alloc ─── bam
    │              │
checkpoint     journal ── cas ── cas_journal
    │                       │
  scrub          health   features     compress

                           xattr ── lock ── watch

On-device layout (block-linear):

┌──────────────┬──────────────┬──────┬─────────────────┬───────────────┐
│ Superblock 0 │ Superblock 1 │ BAM  │  Inode Table    │  Data Blocks  │
│ (256B, dual) │ (256B, dual) │      │  (4 blocks)     │               │
└──────────────┴──────────────┴──────┴─────────────────┴───────────────┘
   Block 0         Block 1     Blk 2    Blocks 3-6        Block 7+

Core Format

Fixed-size, alignment-safe structures for deterministic flash access:

StructureSizePurpose
Superblock256 bytes (dual + scattered)Volume identity, format v5, generation counter, feature flags, mesh fields
Inode128 bytesFile/directory metadata, inline extent, checksum
Extent16 bytesContiguous block range (logical → physical mapping)
DirEntry48 bytes + name + tagDirectory entry with edge type and optional tag
BAM Entry16 bytesBlock Allocation Map — state, erase count, owner (u32 bucket_id)

Superblock Resilience: Dual + Scattered Replicas

NexFS writes superblocks at blocks 0 and 1 as primary and backup. v1.3.0 adds scattered superblock replicas — up to 16 additional copies spread deterministically across the data region. On mount, if blocks 0 and 1 are both corrupt, NexFS calculates all replica positions from volume geometry alone and recovers from the highest-generation valid replica.

ProfileReplicasConvergence
Core4 (2 scattered)1 checkpoint
Sovereign8 (6 scattered)3 checkpoints
Mesh10 (8 scattered)4 checkpoints

Each checkpoint writes blocks 0+1 plus K=2 rotating scattered replicas. All replicas converge over ceil((N-2)/K) checkpoints. Replica positions are derived from volume geometry — no stored map, no chicken-and-egg.

Per-Inode Checksums

Every inode carries a checksum computed over all metadata fields. The hash algorithm is selectable per individual inode — not per-filesystem. A sensor logging temperature data can use XXH3 for speed, while a signing key stored on the same volume uses BLAKE3 for cryptographic integrity.

HashEnumOutputUse Case
BLAKE3-2560x00256-bitDefault. CAS addressing, Merkle DAG, provenance chains
BLAKE3-1280x01128-bit (padded)Constrained devices needing crypto guarantees
XXH3-640x0264-bit (padded)Ultra-fast integrity for high-throughput embedded

Graph-Native Directories

NexFS directories are not flat lists of names — they are typed edge sets in a directed graph. Every directory entry carries an explicit edge type that defines the relationship between parent and child:

Edge TypeValueSemantics
primary0x00Normal parent-child ownership. GC follows these.
reference0x01Non-owning "also lives here" — like a symlink that doesn't break
pin0x02Anchor edge. Prevents garbage collection even if all other edges are removed
projection0x03Auto-generated. Smart folder / query result

Edge Tags

Each directory entry can carry an arbitrary tag (up to 63 bytes) — metadata attached to the relationship, not the file. Use cases:

  • Capability tokens on mount edges
  • Version labels on DAG edges
  • Role annotations ("owner", "reviewer") on reference edges

RevMap: Reverse Edge Index

The RevMap is an in-memory reverse index built lazily by scanning the directory tree. It maps every inode to all edges pointing at it — enabling O(1) answers to "who references this file?" without a full tree walk.

  • Orphan detection: revmap.isOrphan(inode_id) — zero incoming edges
  • Edge queries: revmap.queryEdges(inode_id) — all parents and edge types
  • GC safety: removeEdge() refuses to create orphans unless force=true
  • Depth-limited scan: Recursion capped at 64 levels to prevent stack overflow from cycles

File Operations

Extent-based file I/O with seek support:

OperationFunctionNotes
OpenFileOps.open()Returns handle with position tracking
ReadFileOps.read()Extent-resolved block reads. Returns 0 at EOF.
WriteFileOps.write()Allocates blocks via BAM, extends inode
SeekFileOps.seek()Absolute, relative, or from-end positioning
TruncatetruncateInode()Shrink or extend. Frees released blocks.
CloseFileOps.close()Releases handle state

Copy-on-Write Cloning

cloneInode() creates a deep copy of a file's extent tree. The clone gets its own inode with FLAG_COW set. Data blocks are physically copied (not shared), so the clone is fully independent. Maximum 64 extents per clone operation.

Subsystems

Checkpoint

Atomic metadata flush to persistent storage:

  1. Seal all open allocation buckets
  2. Flush the Block Allocation Map
  3. Increment the superblock generation counter
  4. Write both superblocks (primary + backup)
  5. Write K=2 rotating scattered replicas (if REQ_SCATTERED_SB)
  6. Sync to flash — errors propagate (not silently swallowed)

Write-Ahead Journal

Intent-logging journal for crash-safe multi-step operations:

begin() → recordIntent(addr, data) → commit() → recover()

Recovery semantics: entries in committed state represent writes that were already executed before commit() was called. The journal is cleared on recovery.

Integrity Scrub

Background integrity scanner that walks every allocated inode in the table:

  • Validates inode checksums against stored hashes
  • Counts total inodes scanned and checksum errors found
  • Returns ScrubResult with error counts — zero errors means clean volume
  • Available via C-FFI as nexfs_scrub()

Extended Attributes

Typed attribute slots on inodes (not POSIX xattr — purpose-built for Nexus capabilities):

TypeValuePurpose
capability_perms0x01Capability permission bitfield (SPEC-051)
cas_cid0x02CAS content ID, 32 bytes
encryption_flags0x03Encryption configuration
provenance_hash0x04Provenance chain hash
custom0xFFArbitrary key-value (63-byte key, 255-byte value)

Block Allocation

Bitmap-based allocator with bucket lifecycle tracking:

Bucket StateValueMeaning
free0x00Available for allocation
writing0x01Currently being written to
full0x02Sealed, contains live data
evacuating0x03Being emptied by GC (Hydra phase)
parity0x04Erasure coding data (Hydra phase)

Additional Modules

ModulePurpose
CompressZSTD (1–22) + RLE compression with per-node/per-chunk level selection and double checksum (FLAG_COMPRESSED)
LockAdvisory inode locking with table-based tracking
WatchInode change notification (create, modify, delete events)
HealthFlash health statistics from BAM erase counts
PathFull path resolution with loadSafe() bounds checking

Volume Profiles

NexFS scales via the Baukasten (building-block) model — three profiles activate different module sets:

ProfileStorage ClassFootprintFeatures
Core0x00~40 KBBlock I/O, inodes, BAM, checksums, scrub, per-bucket RLE block compression
Sovereign0x01~400 KB+ CAS, CDC, DAG versioning, TimeWarp snapshots, ZSTD compression (1–22)
Mesh0x02~480 KB+ Wire protocol, peer sync, gossip, ZSTD compression (1–22)

Compression (v1.1.0)

NexFS supports four compression granularity modes, selectable at format time:

ModeProfileGranularityUse Case
per_bucketCorePer-block via BAM entryIoT/satellite – RLE per block, radiation-tolerant
per_dag_nodeSovereignPer-file via DAG metadataFile-level default algo+level
per_cas_chunkSovereignPer-CAS-chunkChunk-level granularity
per_dag_and_chunkSovereignDAG default + chunk overrideZSTD:2 on system, ZSTD:18 on archives – same volume

Double-checksum integrity (ZSTD only): XXH3-64 of compressed data is computed during the streaming compress pass (piped, zero extra cost) and verified before decompression. Bit flips are caught before ZSTD touches the data.

Per-bucket block compression operates at the file I/O layer. Each block's BAM entry carries comp_algo and comp_level. On write, the full block is compressed and stored with a 2-byte length prefix. On read, the block is decompressed before extracting the requested portion. Partial-block writes use read-modify-write to preserve existing data.

Feature Flags & Runtime Tuning (v1.3.0)

Two 32-bit bitmasks in the superblock control mount-time compatibility:

  • required_features — unknown set bits prevent mounting (forward compatibility)
  • optional_features — unknown set bits are safe to ignore
Required FlagBitEffect
REQ_COMPRESSION0Block-level compression active
REQ_ZSTD1ZSTD compression (implies REQ_COMPRESSION)
REQ_DOUBLE_CHECKSUM2XXH3-64 secondary hash on compressed blocks
REQ_CAS_DEDUP3Content-addressable deduplication
REQ_SCATTERED_SB6Superblock scattered across volume
Optional FlagBitEffect
OPT_RECOMPRESSION0Background recompression enabled
OPT_MESH_SYNC1Mesh synchronisation active
OPT_TIMEWARP4TimeWarp snapshot layer

Feature parameters at superblock offsets 0x64–0x68 provide scalar configuration for active features:

ParamOffsetGateDefault (Sovereign)
sb_replica_count0x64REQ_SCATTERED_SB8
sb_checkpoint_batch0x65REQ_SCATTERED_SB2
recompress_target_algo0x66OPT_RECOMPRESSIONZSTD (0x02)
recompress_target_level0x67OPT_RECOMPRESSION3
recompress_free_floor0x68OPT_RECOMPRESSION10%

All parameters are runtime-tunable via nexfs_tune() without unmounting.

COW Re-Compression (v1.3.0)

Background re-compression upgrades stored chunks to better algorithms without data risk. The filesystem provides the mechanism; policy (aging, scheduling, pacing) lives in the Nim Membrane daemon.

The invariant: never modify a committed block. Old data survives until new data is written and journaled.

  1. Read chunk → decompress → verify BLAKE3 CID
  2. Recompress with target algo/level
  3. COW: allocate new block(s), write, journal, update CAS entry
  4. Free old block(s)

Batch mode groups up to 32 chunks in a single journal transaction with full rollback support. A free-space floor (default 10%) prevents recompression from filling the volume — the guard is a crash-safety invariant, not optional policy.

Sovereign Extensions

When running with the Sovereign profile:

  • Content-Addressable Store (CAS): Files addressed by BLAKE3 hash. Automatic deduplication. ZSTD compression with double-checksum integrity on put/get.
  • Content-Defined Chunking (CDC): Large files split at content-determined boundaries. Small edits re-store only changed chunks.
  • DAG Versioning: File history as a Merkle DAG. Efficient branching, merging, and cryptographic history verification.
  • TimeWarp Snapshots: Instant O(1) filesystem snapshots via copy-on-write DAG nodes.

C FFI

NexFS exposes a complete C-compatible API for integration with the Rumpk kernel and other system components:

c
// Lifecycle
nexfs_format(cfg)                    // Format a flash volume
nexfs_mount(cfg)                     // Mount (dual-SB failover)
nexfs_unmount()                      // Unmount
nexfs_is_mounted()                   // Check mount status
nexfs_sync()                         // Flush pending writes
nexfs_checkpoint()                   // Atomic metadata checkpoint

// File operations
nexfs_create(path, mode)             // Create file, returns inode ID
nexfs_read(path, buf, len)           // Read file data
nexfs_write(path, buf, len)          // Write file data
nexfs_delete(path)                   // Delete file
nexfs_truncate(path, new_size)       // Truncate/extend file
nexfs_rename(old_path, new_path)     // Move/rename
nexfs_clone(src_path, dst_path)      // CoW deep copy
nexfs_shred(path)                    // Secure erase (overwrite + flash erase)

// Directory operations
nexfs_mkdir(path, mode)              // Create directory
nexfs_rmdir(path)                    // Remove empty directory

// System
nexfs_scrub(result)                  // Integrity scan
nexfs_health(stats)                  // Flash health statistics
nexfs_lock(inode_id)                 // Advisory lock (non-blocking)
nexfs_unlock(inode_id)               // Release lock

// Recompression
nexfs_recompress(cid, algo, level)     // COW recompress single chunk
nexfs_recompress_batch_begin()          // Start batch transaction
nexfs_recompress_enqueue(cid, a, l)     // Enqueue chunk
nexfs_recompress_batch_commit()         // Commit all + free old blocks
nexfs_recompress_batch_rollback()       // Undo all

// Tuning
nexfs_tune(param, value)                // Runtime parameter adjustment

All functions return 0 on success or a negative error code. Read/write return byte counts.

Wire Protocol

For networked volumes (Mesh profile), NexFS uses a wire protocol over UTCP:

MessagePurpose
BLOCK_WANTRequest a block by hash
BLOCK_PUTDeliver a block
DAG_SYNCSynchronize DAG heads between peers

Design Principles

PrincipleImplementation
Zero dynamic allocationAll buffers caller-provided. No malloc, no heap.
No-std compatibleZig with no libc. Runs on bare metal.
Flash-nativeDesigned for NOR/NAND characteristics. No FTL assumption.
Graph-firstDirectories are edge sets, not flat lists.
Integrity by defaultPer-inode checksums. Dual superblock. Scrub.
Fail-safe writesCheckpoint propagates errors. Journal for multi-step ops.

Comparison

For an honest, detailed comparison of NexFS against ext4, F2FS, XFS, ZFS, Btrfs, bcachefs, and HAMMER2, see NexFS vs. The Field.

License

LSL-1.0 (Libertaria Sovereign License)