NerfEngine Stage 8: Protocol Intelligence, Memory Compression, and the Intelligence Loop

# There’s a specific moment in a system’s life where the question stops being “does it work?” and starts being “does it think?” Stage 8 is that moment for SCYTHE.

This release didn’t add more sensors. It didn’t add more endpoints. It made the system aware of *what the network is supposed to be doing* — and therefore hyper-sensitive to everything it isn’t.

—

## The Problem We Were Really Solving

Stage 7 left us with a speculative graph that could recognise semantic similarity between entities, promote edges that accumulated enough evidence, and stream live intelligence to the operator dashboard. That’s good. But there was a structural gap.

The system was observing **transport behavior** without understanding **protocol intent**.

nDPI was scanning 655+ sessions per PCAP and coming back with `{‘dns_names’: 2, ‘tls_snis’: 0, ‘http_hosts’: 0}`. Nearly blind. When you see port 443 traffic with no TLS SNI, that’s not a nDPI failure — it’s a **signal**. When port 53 traffic has average frame sizes above 300 bytes, that’s not noise — it’s a DNS tunnel.

The difference between a system that drops that information and a system that uses it is the difference between an IDS and an intelligence platform.

Three more cracks were showing alongside this:

1. **Similarity search was tied to FAISS rebuild cycles.** Every streaming insert that needed neighbour lookup was competing with index state. There was no true online path.

2. **The attention engine was scoring edges on four components**, but none of them knew whether the underlying traffic was violating protocol expectations. A port-443 session with no TLS SNI had the same anomaly weight as clean HTTPS.

3. **SSE stream clients had no gap detection.** If a client missed five promotions during a reconnect, it silently fell behind. The dashboard looked live but wasn’t.

—

## Protocol Expectation Intelligence

The core insight from this stage: **IANA port assignments are a behavioral contract**. Port 443 promises TLS. Port 53 promises small DNS frames. Port 123 promises 76-byte NTP packets. When a session breaks that contract, the violation *is* the signal — no signature matching required.

We implemented `protocol_intel.py` — a stateless, data-oblivious IANA violation scorer.

### What it does

Nine violation detectors, each tuned to a specific class of contract breach:

| Violation | Trigger | Score |

|—|—|—|

| `missing_tls` | Port 443/993/995 with no TLS SNI observed | 0.35 |

| `dns_tunnel` | Port 53 avg frame > 300B | 0.55 |

| `oversized_ntp` | Port 123 avg frame > 76B | 0.50 |

| `wrong_transport` | TCP on UDP-only port or vice versa | 0.45 |

| `unexpected_dns` | DNS payload on a non-DNS port | 0.40 |

| `constant_size_c2` | Coefficient of variation < 5% on any variable-size port | 0.40 |

| `tcp_syn_only` | SYN with no ACK or RST (half-open probe / scan) | 0.30 |

| `tcp_rst_flood` | RST flood without SYN/ACK context | 0.35 |

| `risk_port` | Known-malicious ports: 4444, 31337, 6667, 9001, 5555 | dynamic |

The scorer accepts both `pcap_ingest.SessionData` objects and plain dicts, so it works at every layer of the pipeline — PCAP ingest, live stream events, and shadow edge metadata.

### What this unlocks

This is the detection class that signature-based systems miss entirely:

“`

Port 443 TCP

↓

No TLS ClientHello observed

↓

Score: 0.35 (missing_tls)

Same actor, port 53 UDP

↓

Avg frame: 412 bytes

↓

Score: 0.55 (dns_tunnel)

Combined: 0.90 → HOT tier immediately

“`

Neither of those sessions looks malicious by payload. You can’t fingerprint them. But the *behavior* violates protocol expectations — and now the system knows.

### Wired everywhere

`protocol_anomaly_score` and `protocol_violations` are now stamped onto:

– Every session node in the hypergraph (via `pcap_ingest.emit_session()`)

– Every speculative edge from the live ingest worker

– The attention score for every edge in ShadowGraph

—

## Attention Engine: Five-Component Scoring

The original attention formula had four weights summing to 1.0:

“`

attention = conf(0.40) + evidence(0.30) + recency(0.20) + proxy_anomaly(0.10)

“`

The `proxy_anomaly` term was an approximation — it used observation count and unmet requires as a stand-in for true anomaly signal.

Now there are five:

“`

attention = conf(0.37) + evidence(0.28) + recency(0.18) + proxy_anomaly(0.07) + protocol_anomaly(0.10)

“`

`W_PROTO = 0.10` pulls directly from `ProtocolIntel` violation scores. `AttentionResult` now carries a `protocol_anomaly` field, and the engine looks for the score in three dict paths — the session node labels, the edge context, and the top-level edge dict — so it fires regardless of which pipeline stage produced the edge.

The effect: a port-443 session with no TLS and constant-size packets that was previously `WARM` tier is now `HOT` on first observation. No waiting for observation accumulation.

—

## TurboQuant: Eliminating the Index Rebuild Problem

The similarity search architecture had a fundamental mismatch with streaming ingest. FAISS `IndexFlatL2` is excellent for batch workloads — you build an index, you query it. For streaming, you’re constantly inserting while querying. Every new entity is a potential search target immediately, but FAISS has no true online insert path; each `add()` modifies state that concurrent searches read under a shared lock.

We evaluated the paper *”TurboQuant: Near-Optimal Vector Quantization”* (arXiv 2504.19874) and found it maps precisely to this workload.

### What TurboQuant does

Three steps, done once per vector:

1. **Random rotation** — statistically independent coordinates after QR decomposition

2. **Scalar quantization** — optimal per-coordinate quantization against a pre-computed Beta distribution codebook

3. **QJL correction** (TurboQuantIP) — 1-bit residual quantization that removes inner product bias introduced by Stage 1

The result: inner-product-optimal compression at 3 bits per dimension with near-zero indexing time.

### TurboQuantStore

We built `turbo_quant_store.py` as a thread-safe streaming vector store wrapping TurboQuantIP:

“`

add(entity_id, vec) → encode → append to fp16 dense cache O(dim)

search(query, k=10) → normalize → fp16 matmul vs dense cache O(N·dim/16)

“`

The fp16 dense matrix is the search primitive — no graph structure, no index rebuild, no lock contention. Pure `torch.mm`.

**Benchmarks on this hardware:**

| Metric | Value |

|—|—|

| Encode 10,000 vectors | 198ms (one-time, not per-query) |

| Search top-20 over 10,000 vecs | 1.54ms |

| Search top-10 over 100 vecs | 0.03ms |

| Memory per vector (fp16 active) | 1.5KB vs 3KB fp32 (2× compression) |

| Compression vs fp32 codes | 7.5MB for 10k vecs vs 30MB fp32 |

The numpy 2.x compatibility shim (`np.trapz → np.trapezoid`) is applied at import time so downstream code is unaffected.

### Wired into SemanticShadow

TurboQuantStore is now the **primary similarity search backend** in `semantic_shadow.py`:

“`

_embed_with_delta()

→ blend alpha=0.80 (identity continuity)

→ _tq_store.add(entity_id, blended_vec) ← mirrors into fp16 cache

process_entity()

→ if len(_tq_store) > 1:

results = _tq_store.search(vec, k=15) ← TQ primary

else:

results = ee.search_similar(…) ← FAISS fallback

get_pca_coords()

→ reads from _entity_vecs directly (fp32, always most current)

→ no longer loops through FAISS reconstruct()

“`

FAISS remains as the cold-start fallback and persistence layer. TurboQuant handles all hot-path similarity.

—

## Delta Embeddings + Identity Continuity

One of the deeper problems in network intelligence: the same actor rotates IPs, changes ports, cycles ASNs. Treated as independent entities, they look like noise. Treated as a single evolving identity, they’re a pattern.

`_embed_with_delta()` applies exponential identity blending:

“`python

blended = 0.80 * old_vec + 0.20 * fresh_embed

blended /= norm(blended) # re-normalise to unit sphere

“`

Alpha=0.80 means each new observation shifts identity by only 20%. An entity’s embedding is a *weighted average of its entire observation history*, decaying toward the present. This is the LLM KV-cache insight applied to graph entities: you don’t discard context, you blend it.

Combined with TurboQuantStore, this gives us:

– Rotating IPs that share behavioral context → near-identical blended vecs → high cosine similarity → speculative edge

– Different actors on the same port → diverging blended vecs → no spurious edge

—

## SSE Hardening: Sequence Numbers and Backpressure

The `/stream/speculative` SSE endpoint now has production-grade delivery guarantees:

**Sequence numbers.** Every event carries `id: {seq}`. On reconnect, the client sends `Last-Event-ID: N` and the server checks the gap. If `current_seq – last_seen > 1`, a synthetic `_event: resync` fires immediately and the client re-bootstraps from `/api/shadow/edges`.

**Bounded subscriber queues.** Each connected client gets an `_SseSubscriber` with a `queue.Queue(maxsize=500)`. Slow clients can’t block fast ones. `drop_count` is incremented on overflow and reported in heartbeat frames.

**Deadlock-safe sequence counters.** The sequence counter uses a dedicated `_seq_lock` separate from the graph `_lock`. This avoids the deadlock where `push()` holds `_lock` while `_notify_delta()` tried to acquire it to increment `_seq`.

**Browser-side gap handling.** The frontend’s `_watchPromotions()` handles `resync` (full re-bootstrap from REST) and `heartbeat` (drop_count warning banner if > 0). Reconnect delay is advertised in `retry: 3000`.

—

## MMR Neighbor Selection

The speculative graph was accumulating redundant cluster links. A CDN subnet with 50 near-identical IPs was generating O(n²) cross-links — semantically useless, computationally expensive.

**Maximal Marginal Relevance (MMR)** selects diverse-yet-relevant neighbors from the candidate pool:

“`

MMR = λ × similarity_to_query

− (1−λ) × max_similarity_to_already_selected

“`

With λ=0.55, the algorithm is slightly relevance-biased but actively avoids selecting candidates that are redundant with those already chosen. MAX_NEIGHBORS=5 per entity instead of O(n). CDN subnets that previously flooded the graph with 50-node clusters now contribute 5 structurally diverse speculative edges.

—

## Ollama Remote: Embedding at Network Scale

The embedding model (`nomic-embed-text`, 768-dim) now runs on `neurosphere` — an Alma 9 Linux VM with an RTX 3060 (12GB VRAM, CUDA 8.6). Ollama is bound to `0.0.0.0:11434` via systemd override and reachable across the LAN at `192.168.1.185`.

`scythe_orchestrator.py` accepts `–ollama-url` and propagates `OLLAMA_URL` to all spawned subprocesses. Cold-load time for `nomic-embed-text` from GPU eviction: ~8 seconds. The embedding throughput is now network I/O bound, not compute bound.

FAISS AVX2 runs locally (Python process on WSL2). The separation is correct: GPU embeds, CPU indexes and searches.

—

## BehavioralFingerprint: The Foundation for Cross-Domain Tracking

`protocol_intel.py` includes `BehavioralFingerprint` — a 22-dimensional statistical feature vector computed per session:

“`

[avg_pkt_bytes, std_pkt_bytes, CV, median_pkt_bytes,

avg_iat_ms, std_iat_ms, pkt_count, total_bytes,

duration_sec, bytes_per_sec, dst_port, src_port,

proto_tcp, proto_udp, proto_icmp,

has_tls, has_dns, has_http,

tcp_flag_syn, tcp_flag_rst, tcp_flag_fin,

anomaly_score]

“`

This vector is normalized to [0, 1] per dimension and designed for future fusion into delta embeddings:

“`python

fused = concat(text_embedding_768, fingerprint_22)

“`

When that fusion is complete, cosine similarity will measure *behavioral identity* — not just semantic text proximity. The same actor across different IPs, different ports, and different protocols will cluster in the fused space because their *statistical behavior* is similar.

This is the “same actor, different IP” detection problem solved at the vector level.

—

## Deck.gl Visualization: Dual-Layer Intelligence

The frontend now renders two semantically distinct flow layers simultaneously:

– **Validated flows** (cyan, `conf ≥ 0.75`) — promoted edges with full evidence

– **Speculative flows** (amber, pulsing alpha) — shadow graph edges still accumulating evidence

– **Semantic ScatterplotLayer** (magenta ghost nodes) — PCA-projected entity embeddings anchored to geography

– **Promotion flash** — white expanding ring animation when an edge crosses the promotion threshold

The `🔀 FUSION` mode toggle cycles between validated-only, speculative-only, and both layers. This lets the operator distinguish confirmed intelligence from hypothesis-level signal.

—

## Architecture State

“`

Raw PCAP / Live Stream

↓

ProtocolIntel ← IANA violation scoring (NEW)

BehavioralFingerprint ← Statistical identity vector (NEW)

↓

pcap_ingest.emit_session() ← Stamps protocol_anomaly_score on hypergraph nodes (NEW)

↓

EmbeddingEngine (Ollama) ← nomic-embed-text 768-dim on neurosphere RTX 3060

_embed_with_delta() ← α=0.80 identity blending (NEW)

↓

TurboQuantStore ← 3-bit fp16 streaming index, 0.03ms search (NEW)

↓

SemanticShadow.process_entity()

→ TurboQuant search ← Primary (NEW)

→ FAISS search ← Fallback

→ MMR selection ← Diversity filter (NEW)

→ Temporal decay ← Age-weighted bumps (NEW)

↓

ShadowGraph ← Speculative edge store

→ AttentionEngine ← 5-component scoring incl. W_PROTO (NEW)

→ SSE stream ← Seq numbers + resync + backpressure (NEW)

↓

Deck.gl Frontend ← Dual-layer validated/speculative rendering (NEW)

“`

—

## What’s Next

The `BehavioralFingerprint` 22-dim vector is ready. The TurboQuantStore `fingerprint_store()` singleton is waiting. The missing step is the fusion path:

“`python

fused = concat(embed_768, fingerprint_22) # 790-dim

_tq_store.add(entity_id, fused)

“`

Once that’s wired, the similarity search finds actors by *what they do*, not just *what they’re described as*. That closes the “same actor, different IP” loop.

The Android ScytheCommandApp is the other frontier — all of these intelligence layers need a mobile tactical interface that can run disconnected from the LAN and resync over Tailscale. The project scaffold, MainActivity, and mDNS discovery layer are next.

The system now has:

– A memory that forgets slowly (delta embeddings)

– A sensor that detects intent violations (protocol_intel)

– A compressor that thinks at cache speed (TurboQuant)

– An attention system that knows what matters (AttentionEngine)

The next stage is giving it a hand.

—

*NerfEngine is a local-first, RF-aware tactical intelligence platform. All inference runs on local hardware. No data leaves the edge.*

NerfEngine Stage 8: Protocol Intelligence, Memory Compression, and the Intelligence Loop

Leave a Reply Cancel reply