Storage & Network Decisions

Six decisions spanning how Nexus stores data and moves packets — and why neither looks like Linux.

ST1: Graph-Native Filesystem

Status: Accepted

Context

POSIX filesystems are hierarchical trees: one parent per directory entry, metadata coupled to allocation. Content-addressed systems (IPFS, Git) decouple storage from naming but lack directory hierarchy. NexFS needed both — a filesystem that's content-addressed at the block level but navigable as a directory graph.

Decision

NexFS uses content-addressed blocks with graph-native directories:

Every block is BLAKE3-hashed at write time (per-inode selectable: BLAKE3-256, BLAKE3-128, XXH3-64)
Directories store typed edges: primary (ownership), reference (hard link), pin (immutable ref), projection (computed view)
Identical blocks share storage transparently (deduplication)
Dual superblock with automatic failover on corruption
Merkle tree from superblock to blocks enables incremental verification

Alternatives Rejected

Option	Why Not
ext4 / BTRFS	Block allocation opaque, no content addressing, no typed edges
ZFS	Feature-rich but 100MB+ kernel footprint, CDDL licensing concerns
IPFS-style flat store	No directory hierarchy, slow metadata lookup
SQLite-as-filesystem	ACID overhead per write, not suitable for flash-native operation

Consequences

Deduplication automatic and transparent (identical data stored once)
Corruption detectable on every read (hash mismatch)
COW cloning is O(1) — just add a reference edge
RevMap enables reverse traversal (who links to this inode?)
Block allocation requires hash computation (10-50 µs per write)
No POSIX semantics (Membrane translates for legacy apps)

ST2: No /dev, /proc, /sys

Status: Accepted

Context

Linux pseudo-filesystems (/dev, /proc, /sys) evolved organically over 30 years. They're fragile to parse (heuristic text scraping), prone to information leaks (rootkits hide in /proc), and inconsistent (each subsystem formats differently). They're also the #1 source of container escape bugs.

Decision

Replace pseudo-filesystems with explicit interfaces:

/Bus/: Device discovery via symlinks to driver NPI objects (e.g., /Bus/Net/eth0 → /Cell/Driver/net-intel/npi)
ProvChain ledger: Process introspection via formal append-only event log (not file scraping)
KDL manifests: System configuration via declarative files (not sysctl)

Alternatives Rejected

Option	Why Not
Keep `/dev`	Device node permission management is fragile, hot-plug handling awkward
Enhanced `/proc`	Encourages heuristic monitoring, harder to formalize and verify
Plan 9 namespaces	Elegant but requires discipline across the entire stack; hard in mixed systems

Consequences

Service discovery is declarative (stat() on /Bus/ reveals available hardware)
Device state observable via formal ledger (structured, not heuristic)
No pseudo-filesystem to spoof (eliminates entire class of rootkit techniques)
Legacy tools expecting /proc/cpuinfo need Membrane translation
Standard monitoring tools (htop, ps) need reimplementation

ST3: CBOR Wire Format

Status: Accepted

Context

Object serialization needs a binary format. Protocol Buffers require code generation and .proto files. MessagePack lacks schema semantics. JSON is text (2x larger, slower parsing). Nexus serializes objects extensively — CAS objects, ION Ring messages, ProvChain entries.

Decision

CBOR (RFC 8949) for all serialization:

Binary CIDs (BLAKE3 hashes) embedded as native CBOR bstr (byte strings)
Objects tagged with CBOR semantic tags (e.g., tag 42 for Fiber, tag 43 for Event)
Optional fields map to CBOR null (backward compatible without schema versioning)
Self-describing: readers auto-detect types via tags

Alternatives Rejected

Option	Why Not
Protocol Buffers	Requires .proto files, code generation build step, Google ecosystem dependency
MessagePack	No schema semantics, no standard way to embed cryptographic hashes
JSON	Text format, ~2x larger, slower parsing, no binary types
FlatBuffers	Zero-copy but complex schema management, limited language support

Consequences

No code generation needed (CBOR is intrinsic to data)
Self-describing (new fields added without breaking old readers)
Backward compatible (optional fields = null)
Native binary type for CIDs (no hex encoding overhead)
Larger than hand-optimized binary formats (~10-15% overhead)
CBOR tooling less mature than Protocol Buffers ecosystem

N1: TCP/IP in Userland

Status: Accepted

Context

Linux puts TCP/IP in the kernel: complex, hard to modify, context switch overhead for every socket operation. Dedicated TCP offload hardware is expensive and inflexible. LwIP (Lightweight IP) is a BSD-licensed embedded stack at 20KB+ core.

Decision

LwIP runs in userland as part of the Membrane (libnexus.a):

Linked into every application binary as a static library
ION Rings carry raw Ethernet frames: NIC RX → kernel (NetSwitch) → app's LwIP instance
Kernel never parses IP headers — only L2 frame forwarding
Each application controls its own TCP behavior (congestion, windows, timeouts)

Alternatives Rejected

Option	Why Not
Kernel TCP (Linux model)	Context switch overhead, complex kernel code, one-size-fits-all configuration
Custom TCP in Nim	5000+ lines of careful state machine; LwIP is proven and maintained
eBPF TCP	Bleeding edge, ecosystem immature, licensing complexity

Consequences

Per-app networking isolation (one app's corruption can't crash another's TCP stack)
No kernel context switch for network I/O
Apps can tune TCP independently (different congestion algorithms per workload)
Memory overhead: 50-100 KB TCP/IP code per application
Requires LwIP integration with fiber scheduler (no blocking syscalls)

N2: UTCP Over QUIC

Status: Accepted

Context

QUIC is designed for Internet-scale web traffic: 5000+ lines of state machine, complex connection migration, certificate management. Nexus clusters operate on local networks with known peers, exchanging messages (not streams). QUIC is massive overkill.

Decision

UTCP (sovereign transport) for Nexus-to-Nexus communication:

Identity-centric: SipHash-128 CellID addressing, not IP:port
Message-native: Datagram framing, not byte streams
NACK-based: Assume good network, retransmit only on gap detection
L2 fork: EtherType 0x88B5 (UTCP) vs. 0x0800 (IPv4) at NetSwitch
Survives interface migration (WiFi→Ethernet) without session loss

TCP/IP (via LwIP) remains available for Internet-facing traffic. UTCP handles intra-cluster only.

Alternatives Rejected

Option	Why Not
Full QUIC	10x complexity for cluster communication; certificate management overhead
Raw TCP/UDP	Requires IP configuration, routing, ARP; not message-native
gRPC/HTTP	Application-layer complexity, inefficient binary serialization
Custom over UDP	Still requires IP stack; UTCP operates at L2 directly

Consequences

Native message boundaries (no stream reassembly)
Direct CellID addressing (no DNS, no IP routing for local peers)
100x simpler state machine (100 LOC vs. 5000+ for QUIC)
Session survives network interface changes
Limited to Nexus ecosystem (can't talk to legacy Internet services)
NACK model assumes low-loss network (degrades on congested WAN)

N3: L2 Switching Only

Status: Accepted

Context

Kernel-level IP routing adds complexity: routing tables, path selection, ARP caching, ICMP handling. Each piece is a security surface and a maintenance burden. In Nexus, each cell has its own LwIP instance — it handles its own IP.

Decision

The kernel (NetSwitch) operates at Layer 2 only:

Frame forwarding based on EtherType
0x0800 (IPv4) / 0x86DD (IPv6) → route to cell's Membrane
0x88B5 (UTCP) → route to UTCP handler fiber
0x4C57 (LWF) → route to Libertaria Wire Frame handler
No IP parsing, no routing tables, no ARP cache in kernel

Alternatives Rejected

Option	Why Not
Kernel L3 routing	Couples cells together, harder to isolate, adds kernel complexity
No switching (direct NIC per cell)	Requires multiple NICs or SRIOV; not always available
Virtual switch (OVS-style)	Learning tables, spanning tree, broadcast handling — overkill for single host

Consequences

Kernel is IP-unaware (smaller, easier to verify, smaller attack surface)
Each cell's networking is fully independent
No routing table consistency problems
L2 broadcast domains may be large (ARP overhead on many-cell systems)
Multi-host networking requires a management plane (not kernel's job)

Storage & Network Decisions ​

ST1: Graph-Native Filesystem ​

Context ​

Decision ​

Alternatives Rejected ​

Consequences ​

ST2: No /dev, /proc, /sys ​

Context ​

Decision ​

Alternatives Rejected ​

Consequences ​

ST3: CBOR Wire Format ​

Context ​

Decision ​

Alternatives Rejected ​

Consequences ​

N1: TCP/IP in Userland ​

Context ​

Decision ​

Alternatives Rejected ​

Consequences ​

N2: UTCP Over QUIC ​

Context ​

Decision ​

Alternatives Rejected ​

Consequences ​

N3: L2 Switching Only ​

Context ​

Decision ​

Alternatives Rejected ​

Consequences ​

Storage & Network Decisions

ST1: Graph-Native Filesystem

Context

Decision

Alternatives Rejected

Consequences

ST2: No /dev, /proc, /sys

Context

Decision

Alternatives Rejected

Consequences

ST3: CBOR Wire Format

Context

Decision

Alternatives Rejected

Consequences

N1: TCP/IP in Userland

Context

Decision

Alternatives Rejected

Consequences

N2: UTCP Over QUIC

Context

Decision

Alternatives Rejected

Consequences

N3: L2 Switching Only

Context

Decision

Alternatives Rejected

Consequences