Kernel Edges Enable Autonomous Threat Intelligence

# NerfEngine Dev Log — March 13, 2026

..you sit down to fix one thing and end up building something that feels genuinely different from what existed before. We started with a live Ollama instance running `gemma3` on an RTX 3060 and ended up with a fully autonomous network threat intelligence pipeline — kernel-to-LLM — running on a single AlmaLinux 10 server.

Here’s what’s up.

—

Me: Please create a Blog Post for our latest work here today at /home/spectrcyde/NerfEngine/Update Blog Post 03132026.md

ClaudeCode: ◐ The user wants a blog post about today’s work. Let me create an engaging technical blog post covering everything we implemented today:

GraphOps Copilot (graphops_copilot.py) – autonomous LLM investigation agent
Attractor loop fix (confidence threshold, plateau exit, best_confidence tracking)
GraphOps Autopilot (graphops_autopilot.py) – 4-tier autonomous alert system
Three upgrades from Gemma_Llama_MCP.md:
- Detector cooldowns
- HypergraphMetricsCollector
- GraphAttractorDetector (rotating botnet detection)

## Where We Left Off

NerfEngine already had significant pieces in place:

– **Phase A** — `TemporalFanInDetector`: detects VPN-rotating botnets by measuring *timing synchronisation* rather than IP reputation. The insight: rotating proxies randomise IPs but they can’t hide their task scheduler ticking in unison. A botnet hitting a login endpoint from 183 unique IPs in a 200ms window is statistically impossible without central coordination.

– **Phase B** — Kernel-side hypergraph edge emission: `struct graph_edge_event` (56 bytes) emitted directly from the XDP path. The kernel computes 64-bit node IDs via FNV-1a over `__u32 src_ip`. Userspace never touches raw IP tuples on the hot path.

– **Stage 6 demo** — all six scenarios validated clean: `DRIFT[SCANNER] delta=60`, `FAN-IN[BOTNET_COORDINATION] srcs=120 ip_H=6.91 timing_H=-0.00`.

What we hadn’t built yet: the intelligence layer. The part that *explains* what the detectors found, and eventually starts asking its own questions.

—

## Part 1 — GraphOps Copilot (graphops_copilot.py, 1,380 lines)

### The Problem We Solved First

There’s a classic failure mode when you put an LLM in front of a graph query interface: the model returns the *example values from its own few-shot prompt* instead of the user’s actual entity.

Ask “What do you know about 200.36.135.121?” and you get:

“`

labels.ip = “10.0.0.1”

“`

That’s the prompt example. The model retrieved it and repeated it. This is the **query-generation attractor loop bug** — named after the mathematical concept of an attractor state that a system collapses into and can’t escape without an external push.

The fix required three pieces:

**EntityExtractor** — strict regex extraction that runs *before* any LLM call, pulling IPv4, CIDR, ASN, domain, port, and `node:0x…` node IDs out of free text. A compile-time `EXAMPLE_VALUES` set strips hallucinated placeholders before they ever reach the DSL. If the extracted entity matches `”10.0.0.1″` or `”host:session:abc123″`, it gets stripped. Clean or nothing.

**InvestigativeDSLExecutor** — a stateful verb interpreter with 9 verbs:

“`

FOCUS — set investigation target

WINDOW — set time window (e.g. “200ms”, “5s”)

FILTER — prune results by condition

ANALYZE — run a named analysis (degree_delta, fanin, temporal_sync…)

CLUSTER — group by timing or ASN

TRACE — path tracing between nodes

EXPAND — neighbor expansion

SUMMARIZE — aggregate current focus

ASSESS — produce threat confidence score

“`

The executor is *stateful across steps*: `FOCUS 10.99.0.1` persists through subsequent verbs. The LLM doesn’t need to restate the target on every line.

**GraphOpsAgent** — the Ollama-backed reasoning loop. System prompt explicitly lists allowed verbs and says *”NEVER output example placeholder values”*. Model generates a JSON plan; executor runs it; a second LLM call interprets results and emits a confidence score. Loop continues until `confidence >= 0.80`, attractor detected, or plateau.

### The Attractor Loop Fix — Live-Tested Against llama3.2:3b

Running against live Ollama (`llama3.2:3b`, RTX 3060, 27/27 layers GPU), the IP query demonstrated the exact attractor behavior: identical plan repeated all 6 steps, confidence oscillating 0.80 -> 0.80 -> 0.80 -> 0.50 -> 0.80 -> **0.20**. Exits reporting 0.20 — the final-step jitter, not even the best value seen.

Three fixes applied:

“`python

seen_plans: set = set() # frozensets of plan verbs

best_confidence = 0.0 # track max seen, not final jitter

# Attractor break: same plan seen before + confidence good enough -> stop

if frozenset(plan) in seen_plans and best_confidence >= PLATEAU_THRESHOLD:

confidence = best_confidence

break

# Plateau exit: N consecutive non-improving steps >= 0.65 -> stop

if confidence == prev_confidence:

plateau_count += 1

if plateau_count >= PLATEAU_STEPS and best_confidence >= PLATEAU_THRESHOLD:

break

“`

After the fix: the same IP query exits after **1 step** at confidence=0.80.

“`

[GraphOpsAgent] step 0 plan: [‘WINDOW 200ms’, ‘FOCUS ipv4’,

“FILTER ip_address = ‘200.36.135.121’”, ‘ANALYZE fanin’]

[GraphOpsAgent] step 0 confidence=0.80 (best=0.80)

[GraphOpsAgent] confidence threshold reached at step 0

“`

Six round-trips reduced to one. The attractor loop is dead.

—

## Part 2 — GraphOps Autopilot (graphops_autopilot.py, 758 lines)

The copilot answers questions. The autopilot *asks them on its own*.

The design requirement: **LLM only runs when genuinely needed**. The classic failure mode of autonomous LLM systems is the alarm storm — the model fires on everything, floods the analyst queue with noise, and gets disabled within a week. The fix is deterministic anomaly detection *first*, LLM explanation *second*.

“`

kernel telemetry

statistical detectors <- DO NOT skip this

LLM explanation layer <- only when detectors agree

“`

### The Four-Tier System

“`python

TIER_OBSERVATION = 0.60 # internal log only

TIER_SUGGESTION = 0.70 # suggestion queue (analyst pull)

TIER_ALERT = 0.80 # auto-emit EventCard (no human needed)

TIER_AUTONOMOUS = 0.90 # wake InvestigatorAgent (LLM runs)

“`

**AlertDedup** — blake2b hash of `(source_type, pattern, sorted_node_ids, window_bucket)` with 30-second TTL. `window_ms // 1000` buckets absorb inter-detector jitter without suppressing genuinely new events.

**EventCard** — the intelligence briefing object analysts actually want:

“`

GraphOps Alert

————————————————–

Pattern botnet_coordination

Nodes 94

Window 180ms

Confidence 0.84

Temporal Sync 0.82

IP Entropy 6.91

Suggested Actions

> Cluster timing

> Analyze ASN diversity

> Trace infrastructure paths

> Compare with previous clusters

“`

**SentinelLoop** — subscribes to detector callbacks via `.subscribe(cb)`. Fully push-driven. At 20-60 million edges/sec, a polling loop would be a CPU disaster. The detector fires, the callback runs, the router decides. The 200ms heartbeat thread keeps the process alive and does nothing else.

**InvestigatorAgent** — Tier 3 only. Lazy-loads `GraphOpsAgent`. Builds a context-aware question from the `EventCard`, then runs the full investigation loop. A scanner card generates: *”Node X shows a degree delta of 60 connections in the last window. Pattern: scanner. Is this a port scan or lateral movement?”*

### Scoring Functions

The scoring encodes domain knowledge directly. For `FanInAlert`:

“`python

base = 0.92 if verdict == “botnet_coordination” else 0.72

penalty = min(0.15, timing_entropy * 0.10)

score = base – penalty

“`

– Botnet with timing_entropy=0.01 (near-perfect sync): score=0.919 -> **Tier 3 autonomous investigation**

– Fan-in spike with timing_entropy=2.0 (unsynchronised): score=0.57 -> **discarded**

That second case is intentional. Unsynchronised fan-in is a CDN, a flash crowd, or a viral tweet. Discarding it before it ever touches the dedup layer is correct.

### Self-Test Results: 24/24

“`

[PASS] EventCard format_card rendering

[PASS] AlertDedup TTL, key collision, expiry

[PASS] TierRouter all five thresholds

[PASS] Scoring: drift passthrough, botnet -> Tier 3, spike -> discard

[PASS] SentinelLoop: scanner queued, botnet dispatched, dedup suppression

[PASS] GraphOpsAutopilot start/stop lifecycle

“`

—

## Part 3 — Three Upgrades from Code Review

After reviewing the autopilot, the design doc came back with a code review — and it was genuinely useful. Here’s what it recommended and what we built.

### Upgrade 1 — Detector Cooldowns

A noisy detector can still produce high-frequency alerts even with dedup, because each new window produces new node IDs that hash differently. Solution: per-`(source_type, pattern)` cooldown timestamp.

“`python

DETECTOR_COOLDOWN = 1.0 # seconds

cooldown_key = (source_type, pattern)

if time.time() – self._last_emit.get(cooldown_key, 0.0) < DETECTOR_COOLDOWN:

return # suppress burst

self._last_emit[cooldown_key] = time.time()

“`

One line of state, one comparison, prevents alert bursts entirely.

### Upgrade 2 — HypergraphMetricsCollector

A shared `HypergraphSnapshot` computed once per window by a background thread:

“`python

@dataclass

class HypergraphSnapshot:

cluster_density: float # edges / (nodes*(nodes-1))

edge_churn: float # fraction of edges not seen in prev window

temporal_sync: float # 1 – normalised inter-arrival timing entropy

asn_entropy: float # H(src_node_id) across all edges

top_fan_in_rate: float # max fan-in per dst in this window

active_attractors: int # live GraphAttractor cluster count

“`

All three detectors can reference this snapshot instead of recomputing the same Shannon entropy calculations independently. At high edge rates this matters.

### Upgrade 3 — GraphAttractorDetector (The Important One)

Fan-in detection and degree drift detection share a blind spot: they key on *nodes*. A VPN-rotating botnet changes its source nodes — IPs, ASNs, geos — constantly. By the time the detector builds state on one IP, it’s already gone.

**Graph attractors** key on *edge structure* instead. The observation: rotating botnets change their source nodes but they all hit the *same destination infrastructure*. The `edge_hash` is stable even when every `node_a` rotates.

“`python

# Tracks edge_hash -> {src_node_ids} over 5-second sliding windows

# Fires AttractorAlert when:

# cluster_size > 40 (enough distinct sources)

# infrastructure_entropy > 0.8 (sources genuinely diverse, not one IP looping)

# temporal_sync > 0.75 (coordinated timing)

verdict = “rotating_botnet”

“`

To defeat this detector, an attacker would need to either:

1. Use a *different* target on every connection — which defeats the botnet’s actual objective

2. Desynchronise their task scheduling — which defeats coordination

Neither is viable. The attractor is the signal, and it’s extremely robust.

“`

ATTRACTOR [rotating_botnet] edge_hash=0x… cluster=94

infra_H=6.91 t_sync=0.82 score=0.917 -> Tier 3

“`

—

## Part 4 — BPF Edge Aggregation (bpf_capture.c + capture.go)

The performance math:

“`

Raw packets: 100M/sec x 100B/event = 10 GB/sec telemetry

Phase B edges: 100M/sec x 56B/event = 5.6 GB/sec

Aggregated: ~2M/sec x 56B/event = 112 MB/sec

“`

The edge aggregation trick reduces telemetry volume by ~98% before any data leaves the kernel.

**Kernel side** — new `edge_agg` LRU hash map (128K entries, keyed by `edge_hash`):

“`c

struct edge_counter {

__u64 count; // packet count for this edge_hash

__u64 bytes; // total bytes

__u64 first_ts; // ktime of first packet this flush window

__u64 last_ts; // ktime of most recent packet

};

“`

On every packet in the XDP path: if entry exists, atomic_add count and bytes. If not, insert. That’s all. No ringbuf write, no memory copy, no wakeup to userland.

**Go relay side** — new 50ms ticker goroutine added to `capture.go`:

“`go

ticker := time.NewTicker(50 * time.Millisecond)

// Every 50ms: iterate edge_agg map, emit frame 0x05, delete entries

“`

Fifty milliseconds of accumulation. At 100M pkt/sec, that’s 5 million packets compressed into however many distinct edge_hash values appear in that window — typically 1-5% of packet count.

—

## Part 5 — AlmaLinux 10 Kernel Tuning (scripts/tune_kernel.sh, 238 lines)

A properly-tuned kernel for XDP/eBPF looks very different from a default AlmaLinux install. The biggest wins:

“`bash

# Socket buffer — handle 128MB bursts without dropping

net.core.rmem_max = 134217728

# NIC -> kernel queue depth — don’t drop at the NIC

net.core.netdev_max_backlog = 300000

# Busy-poll — reduces latency on the XDP receive path

net.core.busy_read = 50 # us spin before sleeping

net.core.busy_poll = 50 # us spin on NIC before sleeping

# BPF JIT — without this BPF runs interpreted, 3-5x slower

net.core.bpf_jit_enable = 1

net.core.bpf_jit_harden = 0 # disable hardening in prod for speed

# Huge pages — ring buffer performance headroom

vm.nr_hugepages = 512 # 1 GB of 2MB pages

“`

The script also handles NIC IRQ affinity pinning (`–nic <iface>`): one XDP worker per RX queue, NIC queue N -> CPU N. Disables `irqbalance` so the manual pins hold. Sets CPU performance governor. `–dry-run` and `–revert` flags. Idempotent — safe to run repeatedly.

—

## The Full Stack After Today

“`

+——————————————————————+

| AlmaLinux 10 · RTX 3060 · 62.8 GB |

+——————————————————————+

| XDP / eBPF (bpf_capture.c) |

| +– flow_table LRU hash 1M flows |

| +– edge_agg LRU hash 128K entries — 50ms flush [NEW] |

| +– rb ringbuf -> 0x00 FlowCore |

| +– flow_events ringbuf -> 0x02 FlowEnd |

| +– edge_ticks ringbuf -> 0x03 EdgeTick |

| +– graph_edges ringbuf -> 0x04 GraphEdge (OPEN/CLOSE) |

+——————————————————————+

| Go Relay (capture.go) |

| 5 ringbuf drain goroutines |

| 1 edge_agg flush ticker (50ms) -> 0x05 EdgeAgg [NEW] |

| WebSocket broadcast |

+——————————————————————+

| Python Pipeline (stream_manager.py) |

| topology_drift.py |

| +– TopologyDriftDetector scanner / aggregator / lateral |

| +– TemporalFanInDetector botnet timing sync |

| +– GraphAttractorDetector rotating botnet [NEW] |

| +– HypergraphMetricsCollector shared window metrics [NEW] |

+——————————————————————+

| GraphOps Autopilot (graphops_autopilot.py) [NEW] |

| SentinelLoop push-driven, 3 detector subscriptions |

| AlertDedup blake2b hash · 30s TTL |

| TierRouter 0.60 / 0.70 / 0.80 / 0.90 |

| EventCard analyst briefing object + format_card() |

| InvestigatorAgent Tier 3 -> GraphOpsAgent |

+——————————————————————+

| GraphOps Copilot (graphops_copilot.py) [NEW] |

| EntityExtractor regex + EXAMPLE_VALUES guard |

| InvestigativeDSLExecutor 9 verbs, stateful |

| GraphOpsAgent attractor-proof loop · llama3.2:3b |

+——————————————————————+

| MCP Server — 11 tools total |

| graphops_investigate natural language -> report |

| graphops_dsl_exec raw DSL plan execution |

| graphops_entity_parse entity extraction |

| graphops_autopilot_status runtime stats [NEW] |

| graphops_suggestion_queue Tier 1 queue [NEW] |

| graphops_observation_log Tier 0 log [NEW] |

| graphops_format_card render event card [NEW] |

+——————————————————————+

“`

—

## By The Numbers

| File | Lines | Status |

|—|—|—|

| graphops_copilot.py | 1,380 | 5/5 tests pass · live Ollama validated |

| graphops_autopilot.py | 758 | 24/24 tests pass |

| topology_drift.py | 912 | 3 new classes · smoke tests pass |

| bpf_capture.c | 426 | pending make bpf |

| capture.go | 1,000 | pending go build |

| scripts/tune_kernel.sh | 238 | syntax valid |

**~2,800 lines of new code today across 6 files.**

All Python tests pass. BPF/Go changes pending hardware compile.

—

## What’s Next

**Immediate** — `FRAME_TAG_EDGE_AGG = 0x05` handler in `stream_manager.py` and `EdgeAggEvent` in `FlowCore.py`. Same pattern as all other frame types.

**Interesting** — Wire `HypergraphSnapshot` into `GraphAttractorDetector` so the shared metrics precomputation replaces the attractor’s own per-window entropy calls.

**Exciting** — In-kernel botnet detection. Move fan-in counters and temporal burst tracking into BPF maps. Emit `BOTNET_CLUSTER` events directly from the XDP path. The kernel becomes the first anomaly detector. Userland receives only high-value signals. That’s the architecture telecom operators quietly run for abuse detection at hyperscale.

**Wild** — Hypergraph spectral fingerprinting. Compute eigenvalue signatures of traffic clusters. Detects VPN-rotating botnets even when every node and IP changes every minute, by analysing the structural fingerprint of the traffic graph rather than any individual node property.

—

*NerfEngine is a personal research project combining eBPF/XDP network telemetry, hypergraph analysis, and autonomous LLM-backed threat detection. The architecture described here — kernel-side graph primitive emission, timing-entropy botnet detection, graph attractor analysis, and attractor-proof LLM agent loops — represents genuinely novel combinations of techniques that don’t appear together elsewhere in the open literature. Building it has been one of the most interesting engineering exercises I’ve done.*

*More updates as the build continues. The kernel is listening.*

Blog post written — 380 lines at /home/spectrcyde/NerfEngine/Update Blog Post 03132026.md.

It covers:

GraphOps Copilot — the attractor loop bug, the fix, live test results against llama3.2:3b
GraphOps Autopilot — 4-tier system, AlertDedup, SentinelLoop, scoring logic, 24/24 pass
Three upgrades — cooldowns, HypergraphMetricsCollector, GraphAttractorDetector (the rotating-botnet detector)
BPF edge aggregation — 98% telemetry reduction, kernel + Go relay design
AlmaLinux 10 tuning script — key sysctl values, IRQ pinning, JIT config
Full stack diagram, by-the-numbers table, and what’s next

Extra Special Thanks to ChatGPT!!!

https://chatgpt.com/share/69b3a327-a4c4-8011-a9d0-d82cfc8804ed

Kernel Edges Enable Autonomous Threat Intelligence

Extra Special Thanks to ChatGPT!!!

Leave a Reply Cancel reply