# There’s a specific moment in a system’s life where the question stops being “does it work?” and starts being “does it think?” Stage 8 is that moment for SCYTHE.
This release didn’t add more sensors. It didn’t add more endpoints. It made the system aware of *what the network is supposed to be doing* — and therefore hyper-sensitive to everything it isn’t.
—
## The Problem We Were Really Solving
Stage 7 left us with a speculative graph that could recognise semantic similarity between entities, promote edges that accumulated enough evidence, and stream live intelligence to the operator dashboard. That’s good. But there was a structural gap.
The system was observing **transport behavior** without understanding **protocol intent**.
nDPI was scanning 655+ sessions per PCAP and coming back with `{‘dns_names’: 2, ‘tls_snis’: 0, ‘http_hosts’: 0}`. Nearly blind. When you see port 443 traffic with no TLS SNI, that’s not a nDPI failure — it’s a **signal**. When port 53 traffic has average frame sizes above 300 bytes, that’s not noise — it’s a DNS tunnel.
The difference between a system that drops that information and a system that uses it is the difference between an IDS and an intelligence platform.
Three more cracks were showing alongside this:
1. **Similarity search was tied to FAISS rebuild cycles.** Every streaming insert that needed neighbour lookup was competing with index state. There was no true online path.
2. **The attention engine was scoring edges on four components**, but none of them knew whether the underlying traffic was violating protocol expectations. A port-443 session with no TLS SNI had the same anomaly weight as clean HTTPS.
3. **SSE stream clients had no gap detection.** If a client missed five promotions during a reconnect, it silently fell behind. The dashboard looked live but wasn’t.
—
## Protocol Expectation Intelligence
The core insight from this stage: **IANA port assignments are a behavioral contract**. Port 443 promises TLS. Port 53 promises small DNS frames. Port 123 promises 76-byte NTP packets. When a session breaks that contract, the violation *is* the signal — no signature matching required.
We implemented `protocol_intel.py` — a stateless, data-oblivious IANA violation scorer.
### What it does
Nine violation detectors, each tuned to a specific class of contract breach:
| Violation | Trigger | Score |
|—|—|—|
| `missing_tls` | Port 443/993/995 with no TLS SNI observed | 0.35 |
| `dns_tunnel` | Port 53 avg frame > 300B | 0.55 |
| `oversized_ntp` | Port 123 avg frame > 76B | 0.50 |
| `wrong_transport` | TCP on UDP-only port or vice versa | 0.45 |
| `unexpected_dns` | DNS payload on a non-DNS port | 0.40 |
| `constant_size_c2` | Coefficient of variation < 5% on any variable-size port | 0.40 |
| `tcp_syn_only` | SYN with no ACK or RST (half-open probe / scan) | 0.30 |
| `tcp_rst_flood` | RST flood without SYN/ACK context | 0.35 |
| `risk_port` | Known-malicious ports: 4444, 31337, 6667, 9001, 5555 | dynamic |
The scorer accepts both `pcap_ingest.SessionData` objects and plain dicts, so it works at every layer of the pipeline — PCAP ingest, live stream events, and shadow edge metadata.
### What this unlocks
This is the detection class that signature-based systems miss entirely:
“`
Port 443 TCP
↓
No TLS ClientHello observed
↓
Score: 0.35 (missing_tls)
Same actor, port 53 UDP
↓
Avg frame: 412 bytes
↓
Score: 0.55 (dns_tunnel)
Combined: 0.90 → HOT tier immediately
“`
Neither of those sessions looks malicious by payload. You can’t fingerprint them. But the *behavior* violates protocol expectations — and now the system knows.
### Wired everywhere
`protocol_anomaly_score` and `protocol_violations` are now stamped onto:
– Every session node in the hypergraph (via `pcap_ingest.emit_session()`)
– Every speculative edge from the live ingest worker
– The attention score for every edge in ShadowGraph
—
## Attention Engine: Five-Component Scoring
The original attention formula had four weights summing to 1.0:
“`
attention = conf(0.40) + evidence(0.30) + recency(0.20) + proxy_anomaly(0.10)
“`
The `proxy_anomaly` term was an approximation — it used observation count and unmet requires as a stand-in for true anomaly signal.
Now there are five:
“`
attention = conf(0.37) + evidence(0.28) + recency(0.18) + proxy_anomaly(0.07) + protocol_anomaly(0.10)
“`
`W_PROTO = 0.10` pulls directly from `ProtocolIntel` violation scores. `AttentionResult` now carries a `protocol_anomaly` field, and the engine looks for the score in three dict paths — the session node labels, the edge context, and the top-level edge dict — so it fires regardless of which pipeline stage produced the edge.
The effect: a port-443 session with no TLS and constant-size packets that was previously `WARM` tier is now `HOT` on first observation. No waiting for observation accumulation.
—
## TurboQuant: Eliminating the Index Rebuild Problem
The similarity search architecture had a fundamental mismatch with streaming ingest. FAISS `IndexFlatL2` is excellent for batch workloads — you build an index, you query it. For streaming, you’re constantly inserting while querying. Every new entity is a potential search target immediately, but FAISS has no true online insert path; each `add()` modifies state that concurrent searches read under a shared lock.
We evaluated the paper *”TurboQuant: Near-Optimal Vector Quantization”* (arXiv 2504.19874) and found it maps precisely to this workload.
### What TurboQuant does
Three steps, done once per vector:
1. **Random rotation** — statistically independent coordinates after QR decomposition
2. **Scalar quantization** — optimal per-coordinate quantization against a pre-computed Beta distribution codebook
3. **QJL correction** (TurboQuantIP) — 1-bit residual quantization that removes inner product bias introduced by Stage 1
The result: inner-product-optimal compression at 3 bits per dimension with near-zero indexing time.
### TurboQuantStore
We built `turbo_quant_store.py` as a thread-safe streaming vector store wrapping TurboQuantIP:
“`
add(entity_id, vec) → encode → append to fp16 dense cache O(dim)
search(query, k=10) → normalize → fp16 matmul vs dense cache O(N·dim/16)
“`
The fp16 dense matrix is the search primitive — no graph structure, no index rebuild, no lock contention. Pure `torch.mm`.
**Benchmarks on this hardware:**
| Metric | Value |
|—|—|
| Encode 10,000 vectors | 198ms (one-time, not per-query) |
| Search top-20 over 10,000 vecs | 1.54ms |
| Search top-10 over 100 vecs | 0.03ms |
| Memory per vector (fp16 active) | 1.5KB vs 3KB fp32 (2× compression) |
| Compression vs fp32 codes | 7.5MB for 10k vecs vs 30MB fp32 |
The numpy 2.x compatibility shim (`np.trapz → np.trapezoid`) is applied at import time so downstream code is unaffected.
### Wired into SemanticShadow
TurboQuantStore is now the **primary similarity search backend** in `semantic_shadow.py`:
“`
_embed_with_delta()
→ blend alpha=0.80 (identity continuity)
→ _tq_store.add(entity_id, blended_vec) ← mirrors into fp16 cache
process_entity()
→ if len(_tq_store) > 1:
results = _tq_store.search(vec, k=15) ← TQ primary
else:
results = ee.search_similar(…) ← FAISS fallback
get_pca_coords()
→ reads from _entity_vecs directly (fp32, always most current)
→ no longer loops through FAISS reconstruct()
“`
FAISS remains as the cold-start fallback and persistence layer. TurboQuant handles all hot-path similarity.
—
## Delta Embeddings + Identity Continuity
One of the deeper problems in network intelligence: the same actor rotates IPs, changes ports, cycles ASNs. Treated as independent entities, they look like noise. Treated as a single evolving identity, they’re a pattern.
`_embed_with_delta()` applies exponential identity blending:
“`python
blended = 0.80 * old_vec + 0.20 * fresh_embed
blended /= norm(blended) # re-normalise to unit sphere
“`
Alpha=0.80 means each new observation shifts identity by only 20%. An entity’s embedding is a *weighted average of its entire observation history*, decaying toward the present. This is the LLM KV-cache insight applied to graph entities: you don’t discard context, you blend it.
Combined with TurboQuantStore, this gives us:
– Rotating IPs that share behavioral context → near-identical blended vecs → high cosine similarity → speculative edge
– Different actors on the same port → diverging blended vecs → no spurious edge
—
## SSE Hardening: Sequence Numbers and Backpressure
The `/stream/speculative` SSE endpoint now has production-grade delivery guarantees:
**Sequence numbers.** Every event carries `id: {seq}`. On reconnect, the client sends `Last-Event-ID: N` and the server checks the gap. If `current_seq – last_seen > 1`, a synthetic `_event: resync` fires immediately and the client re-bootstraps from `/api/shadow/edges`.
**Bounded subscriber queues.** Each connected client gets an `_SseSubscriber` with a `queue.Queue(maxsize=500)`. Slow clients can’t block fast ones. `drop_count` is incremented on overflow and reported in heartbeat frames.
**Deadlock-safe sequence counters.** The sequence counter uses a dedicated `_seq_lock` separate from the graph `_lock`. This avoids the deadlock where `push()` holds `_lock` while `_notify_delta()` tried to acquire it to increment `_seq`.
**Browser-side gap handling.** The frontend’s `_watchPromotions()` handles `resync` (full re-bootstrap from REST) and `heartbeat` (drop_count warning banner if > 0). Reconnect delay is advertised in `retry: 3000`.
—
## MMR Neighbor Selection
The speculative graph was accumulating redundant cluster links. A CDN subnet with 50 near-identical IPs was generating O(n²) cross-links — semantically useless, computationally expensive.
**Maximal Marginal Relevance (MMR)** selects diverse-yet-relevant neighbors from the candidate pool:
“`
MMR = λ × similarity_to_query
− (1−λ) × max_similarity_to_already_selected
“`
With λ=0.55, the algorithm is slightly relevance-biased but actively avoids selecting candidates that are redundant with those already chosen. MAX_NEIGHBORS=5 per entity instead of O(n). CDN subnets that previously flooded the graph with 50-node clusters now contribute 5 structurally diverse speculative edges.
—
## Ollama Remote: Embedding at Network Scale
The embedding model (`nomic-embed-text`, 768-dim) now runs on `neurosphere` — an Alma 9 Linux VM with an RTX 3060 (12GB VRAM, CUDA 8.6). Ollama is bound to `0.0.0.0:11434` via systemd override and reachable across the LAN at `192.168.1.185`.
`scythe_orchestrator.py` accepts `–ollama-url` and propagates `OLLAMA_URL` to all spawned subprocesses. Cold-load time for `nomic-embed-text` from GPU eviction: ~8 seconds. The embedding throughput is now network I/O bound, not compute bound.
FAISS AVX2 runs locally (Python process on WSL2). The separation is correct: GPU embeds, CPU indexes and searches.
—
## BehavioralFingerprint: The Foundation for Cross-Domain Tracking
`protocol_intel.py` includes `BehavioralFingerprint` — a 22-dimensional statistical feature vector computed per session:
“`
[avg_pkt_bytes, std_pkt_bytes, CV, median_pkt_bytes,
avg_iat_ms, std_iat_ms, pkt_count, total_bytes,
duration_sec, bytes_per_sec, dst_port, src_port,
proto_tcp, proto_udp, proto_icmp,
has_tls, has_dns, has_http,
tcp_flag_syn, tcp_flag_rst, tcp_flag_fin,
anomaly_score]
“`
This vector is normalized to [0, 1] per dimension and designed for future fusion into delta embeddings:
“`python
fused = concat(text_embedding_768, fingerprint_22)
“`
When that fusion is complete, cosine similarity will measure *behavioral identity* — not just semantic text proximity. The same actor across different IPs, different ports, and different protocols will cluster in the fused space because their *statistical behavior* is similar.
This is the “same actor, different IP” detection problem solved at the vector level.
—
## Deck.gl Visualization: Dual-Layer Intelligence
The frontend now renders two semantically distinct flow layers simultaneously:
– **Validated flows** (cyan, `conf ≥ 0.75`) — promoted edges with full evidence
– **Speculative flows** (amber, pulsing alpha) — shadow graph edges still accumulating evidence
– **Semantic ScatterplotLayer** (magenta ghost nodes) — PCA-projected entity embeddings anchored to geography
– **Promotion flash** — white expanding ring animation when an edge crosses the promotion threshold
The `🔀 FUSION` mode toggle cycles between validated-only, speculative-only, and both layers. This lets the operator distinguish confirmed intelligence from hypothesis-level signal.
—
## Architecture State
“`
Raw PCAP / Live Stream
↓
ProtocolIntel ← IANA violation scoring (NEW)
BehavioralFingerprint ← Statistical identity vector (NEW)
↓
pcap_ingest.emit_session() ← Stamps protocol_anomaly_score on hypergraph nodes (NEW)
↓
EmbeddingEngine (Ollama) ← nomic-embed-text 768-dim on neurosphere RTX 3060
_embed_with_delta() ← α=0.80 identity blending (NEW)
↓
TurboQuantStore ← 3-bit fp16 streaming index, 0.03ms search (NEW)
↓
SemanticShadow.process_entity()
→ TurboQuant search ← Primary (NEW)
→ FAISS search ← Fallback
→ MMR selection ← Diversity filter (NEW)
→ Temporal decay ← Age-weighted bumps (NEW)
↓
ShadowGraph ← Speculative edge store
→ AttentionEngine ← 5-component scoring incl. W_PROTO (NEW)
→ SSE stream ← Seq numbers + resync + backpressure (NEW)
↓
Deck.gl Frontend ← Dual-layer validated/speculative rendering (NEW)
“`
—
## What’s Next
The `BehavioralFingerprint` 22-dim vector is ready. The TurboQuantStore `fingerprint_store()` singleton is waiting. The missing step is the fusion path:
“`python
fused = concat(embed_768, fingerprint_22) # 790-dim
_tq_store.add(entity_id, fused)
“`
Once that’s wired, the similarity search finds actors by *what they do*, not just *what they’re described as*. That closes the “same actor, different IP” loop.
The Android ScytheCommandApp is the other frontier — all of these intelligence layers need a mobile tactical interface that can run disconnected from the LAN and resync over Tailscale. The project scaffold, MainActivity, and mDNS discovery layer are next.
The system now has:
– A memory that forgets slowly (delta embeddings)
– A sensor that detects intent violations (protocol_intel)
– A compressor that thinks at cache speed (TurboQuant)
– An attention system that knows what matters (AttentionEngine)
The next stage is giving it a hand.
—
*NerfEngine is a local-first, RF-aware tactical intelligence platform. All inference runs on local hardware. No data leaves the edge.*
