Storage & Network Decisions
Six decisions spanning how Nexus stores data and moves packets — and why neither looks like Linux.
ST1: Graph-Native Filesystem
Status: Accepted
Context
POSIX filesystems are hierarchical trees: one parent per directory entry, metadata coupled to allocation. Content-addressed systems (IPFS, Git) decouple storage from naming but lack directory hierarchy. NexFS needed both — a filesystem that's content-addressed at the block level but navigable as a directory graph.
Decision
NexFS uses content-addressed blocks with graph-native directories:
- Every block is BLAKE3-hashed at write time (per-inode selectable: BLAKE3-256, BLAKE3-128, XXH3-64)
- Directories store typed edges: primary (ownership), reference (hard link), pin (immutable ref), projection (computed view)
- Identical blocks share storage transparently (deduplication)
- Dual superblock with automatic failover on corruption
- Merkle tree from superblock to blocks enables incremental verification
Alternatives Rejected
| Option | Why Not |
|---|---|
| ext4 / BTRFS | Block allocation opaque, no content addressing, no typed edges |
| ZFS | Feature-rich but 100MB+ kernel footprint, CDDL licensing concerns |
| IPFS-style flat store | No directory hierarchy, slow metadata lookup |
| SQLite-as-filesystem | ACID overhead per write, not suitable for flash-native operation |
Consequences
- Deduplication automatic and transparent (identical data stored once)
- Corruption detectable on every read (hash mismatch)
- COW cloning is O(1) — just add a reference edge
- RevMap enables reverse traversal (who links to this inode?)
- Block allocation requires hash computation (10-50 µs per write)
- No POSIX semantics (Membrane translates for legacy apps)
ST2: No /dev, /proc, /sys
Status: Accepted
Context
Linux pseudo-filesystems (/dev, /proc, /sys) evolved organically over 30 years. They're fragile to parse (heuristic text scraping), prone to information leaks (rootkits hide in /proc), and inconsistent (each subsystem formats differently). They're also the #1 source of container escape bugs.
Decision
Replace pseudo-filesystems with explicit interfaces:
/Bus/: Device discovery via symlinks to driver NPI objects (e.g.,/Bus/Net/eth0→/Cell/Driver/net-intel/npi)- ProvChain ledger: Process introspection via formal append-only event log (not file scraping)
- KDL manifests: System configuration via declarative files (not sysctl)
Alternatives Rejected
| Option | Why Not |
|---|---|
Keep /dev | Device node permission management is fragile, hot-plug handling awkward |
Enhanced /proc | Encourages heuristic monitoring, harder to formalize and verify |
| Plan 9 namespaces | Elegant but requires discipline across the entire stack; hard in mixed systems |
Consequences
- Service discovery is declarative (
stat()on/Bus/reveals available hardware) - Device state observable via formal ledger (structured, not heuristic)
- No pseudo-filesystem to spoof (eliminates entire class of rootkit techniques)
- Legacy tools expecting
/proc/cpuinfoneed Membrane translation - Standard monitoring tools (
htop,ps) need reimplementation
ST3: CBOR Wire Format
Status: Accepted
Context
Object serialization needs a binary format. Protocol Buffers require code generation and .proto files. MessagePack lacks schema semantics. JSON is text (2x larger, slower parsing). Nexus serializes objects extensively — CAS objects, ION Ring messages, ProvChain entries.
Decision
CBOR (RFC 8949) for all serialization:
- Binary CIDs (BLAKE3 hashes) embedded as native CBOR
bstr(byte strings) - Objects tagged with CBOR semantic tags (e.g., tag 42 for Fiber, tag 43 for Event)
- Optional fields map to CBOR
null(backward compatible without schema versioning) - Self-describing: readers auto-detect types via tags
Alternatives Rejected
| Option | Why Not |
|---|---|
| Protocol Buffers | Requires .proto files, code generation build step, Google ecosystem dependency |
| MessagePack | No schema semantics, no standard way to embed cryptographic hashes |
| JSON | Text format, ~2x larger, slower parsing, no binary types |
| FlatBuffers | Zero-copy but complex schema management, limited language support |
Consequences
- No code generation needed (CBOR is intrinsic to data)
- Self-describing (new fields added without breaking old readers)
- Backward compatible (optional fields = null)
- Native binary type for CIDs (no hex encoding overhead)
- Larger than hand-optimized binary formats (~10-15% overhead)
- CBOR tooling less mature than Protocol Buffers ecosystem
N1: TCP/IP in Userland
Status: Accepted
Context
Linux puts TCP/IP in the kernel: complex, hard to modify, context switch overhead for every socket operation. Dedicated TCP offload hardware is expensive and inflexible. LwIP (Lightweight IP) is a BSD-licensed embedded stack at 20KB+ core.
Decision
LwIP runs in userland as part of the Membrane (libnexus.a):
- Linked into every application binary as a static library
- ION Rings carry raw Ethernet frames: NIC RX → kernel (NetSwitch) → app's LwIP instance
- Kernel never parses IP headers — only L2 frame forwarding
- Each application controls its own TCP behavior (congestion, windows, timeouts)
Alternatives Rejected
| Option | Why Not |
|---|---|
| Kernel TCP (Linux model) | Context switch overhead, complex kernel code, one-size-fits-all configuration |
| Custom TCP in Nim | 5000+ lines of careful state machine; LwIP is proven and maintained |
| eBPF TCP | Bleeding edge, ecosystem immature, licensing complexity |
Consequences
- Per-app networking isolation (one app's corruption can't crash another's TCP stack)
- No kernel context switch for network I/O
- Apps can tune TCP independently (different congestion algorithms per workload)
- Memory overhead: 50-100 KB TCP/IP code per application
- Requires LwIP integration with fiber scheduler (no blocking syscalls)
N2: UTCP Over QUIC
Status: Accepted
Context
QUIC is designed for Internet-scale web traffic: 5000+ lines of state machine, complex connection migration, certificate management. Nexus clusters operate on local networks with known peers, exchanging messages (not streams). QUIC is massive overkill.
Decision
UTCP (sovereign transport) for Nexus-to-Nexus communication:
- Identity-centric: SipHash-128 CellID addressing, not IP:port
- Message-native: Datagram framing, not byte streams
- NACK-based: Assume good network, retransmit only on gap detection
- L2 fork: EtherType
0x88B5(UTCP) vs.0x0800(IPv4) at NetSwitch - Survives interface migration (WiFi→Ethernet) without session loss
TCP/IP (via LwIP) remains available for Internet-facing traffic. UTCP handles intra-cluster only.
Alternatives Rejected
| Option | Why Not |
|---|---|
| Full QUIC | 10x complexity for cluster communication; certificate management overhead |
| Raw TCP/UDP | Requires IP configuration, routing, ARP; not message-native |
| gRPC/HTTP | Application-layer complexity, inefficient binary serialization |
| Custom over UDP | Still requires IP stack; UTCP operates at L2 directly |
Consequences
- Native message boundaries (no stream reassembly)
- Direct CellID addressing (no DNS, no IP routing for local peers)
- 100x simpler state machine (100 LOC vs. 5000+ for QUIC)
- Session survives network interface changes
- Limited to Nexus ecosystem (can't talk to legacy Internet services)
- NACK model assumes low-loss network (degrades on congested WAN)
N3: L2 Switching Only
Status: Accepted
Context
Kernel-level IP routing adds complexity: routing tables, path selection, ARP caching, ICMP handling. Each piece is a security surface and a maintenance burden. In Nexus, each cell has its own LwIP instance — it handles its own IP.
Decision
The kernel (NetSwitch) operates at Layer 2 only:
- Frame forwarding based on EtherType
0x0800(IPv4) /0x86DD(IPv6) → route to cell's Membrane0x88B5(UTCP) → route to UTCP handler fiber0x4C57(LWF) → route to Libertaria Wire Frame handler- No IP parsing, no routing tables, no ARP cache in kernel
Alternatives Rejected
| Option | Why Not |
|---|---|
| Kernel L3 routing | Couples cells together, harder to isolate, adds kernel complexity |
| No switching (direct NIC per cell) | Requires multiple NICs or SRIOV; not always available |
| Virtual switch (OVS-style) | Learning tables, spanning tree, broadcast handling — overkill for single host |
Consequences
- Kernel is IP-unaware (smaller, easier to verify, smaller attack surface)
- Each cell's networking is fully independent
- No routing table consistency problems
- L2 broadcast domains may be large (ARP overhead on many-cell systems)
- Multi-host networking requires a management plane (not kernel's job)