Skip to content

Kernel Edges Enable Autonomous Threat Intelligence

# NerfEngine Dev Log — March 13, 2026

..you sit down to fix one thing and end up building something that feels genuinely different from what existed before. We started with a live Ollama instance running `gemma3` on an RTX 3060 and ended up with a fully autonomous network threat intelligence pipeline — kernel-to-LLM — running on a single AlmaLinux 10 server.

Here’s what’s up.

Me: Please create a Blog Post for our latest work here today at /home/spectrcyde/NerfEngine/Update Blog Post 03132026.md

ClaudeCode: ◐ The user wants a blog post about today’s work. Let me create an engaging technical blog post covering everything we implemented today:

  1. GraphOps Copilot (graphops_copilot.py) – autonomous LLM investigation agent
  2. Attractor loop fix (confidence threshold, plateau exit, best_confidence tracking)
  3. GraphOps Autopilot (graphops_autopilot.py) – 4-tier autonomous alert system
  4. Three upgrades from Gemma_Llama_MCP.md:
    • Detector cooldowns
    • HypergraphMetricsCollector
    • GraphAttractorDetector (rotating botnet detection)

## Where We Left Off

NerfEngine already had significant pieces in place:

**Phase A** — `TemporalFanInDetector`: detects VPN-rotating botnets by measuring *timing synchronisation* rather than IP reputation. The insight: rotating proxies randomise IPs but they can’t hide their task scheduler ticking in unison. A botnet hitting a login endpoint from 183 unique IPs in a 200ms window is statistically impossible without central coordination.

**Phase B** — Kernel-side hypergraph edge emission: `struct graph_edge_event` (56 bytes) emitted directly from the XDP path. The kernel computes 64-bit node IDs via FNV-1a over `__u32 src_ip`. Userspace never touches raw IP tuples on the hot path.

**Stage 6 demo** — all six scenarios validated clean: `DRIFT[SCANNER] delta=60`, `FAN-IN[BOTNET_COORDINATION] srcs=120 ip_H=6.91 timing_H=-0.00`.

What we hadn’t built yet: the intelligence layer. The part that *explains* what the detectors found, and eventually starts asking its own questions.

## Part 1 — GraphOps Copilot (graphops_copilot.py, 1,380 lines)

### The Problem We Solved First

There’s a classic failure mode when you put an LLM in front of a graph query interface: the model returns the *example values from its own few-shot prompt* instead of the user’s actual entity.

Ask “What do you know about 200.36.135.121?” and you get:

“`

labels.ip = “10.0.0.1”

“`

That’s the prompt example. The model retrieved it and repeated it. This is the **query-generation attractor loop bug** — named after the mathematical concept of an attractor state that a system collapses into and can’t escape without an external push.

The fix required three pieces:

**EntityExtractor** — strict regex extraction that runs *before* any LLM call, pulling IPv4, CIDR, ASN, domain, port, and `node:0x…` node IDs out of free text. A compile-time `EXAMPLE_VALUES` set strips hallucinated placeholders before they ever reach the DSL. If the extracted entity matches `”10.0.0.1″` or `”host:session:abc123″`, it gets stripped. Clean or nothing.

**InvestigativeDSLExecutor** — a stateful verb interpreter with 9 verbs:

“`

FOCUS    — set investigation target

WINDOW   — set time window (e.g. “200ms”, “5s”)

FILTER   — prune results by condition

ANALYZE  — run a named analysis (degree_delta, fanin, temporal_sync…)

CLUSTER  — group by timing or ASN

TRACE    — path tracing between nodes

EXPAND   — neighbor expansion

SUMMARIZE — aggregate current focus

ASSESS   — produce threat confidence score

“`

The executor is *stateful across steps*: `FOCUS 10.99.0.1` persists through subsequent verbs. The LLM doesn’t need to restate the target on every line.

**GraphOpsAgent** — the Ollama-backed reasoning loop. System prompt explicitly lists allowed verbs and says *”NEVER output example placeholder values”*. Model generates a JSON plan; executor runs it; a second LLM call interprets results and emits a confidence score. Loop continues until `confidence >= 0.80`, attractor detected, or plateau.

### The Attractor Loop Fix — Live-Tested Against llama3.2:3b

Running against live Ollama (`llama3.2:3b`, RTX 3060, 27/27 layers GPU), the IP query demonstrated the exact attractor behavior: identical plan repeated all 6 steps, confidence oscillating 0.80 -> 0.80 -> 0.80 -> 0.50 -> 0.80 -> **0.20**. Exits reporting 0.20 — the final-step jitter, not even the best value seen.

Three fixes applied:

“`python

seen_plans: set = set()          # frozensets of plan verbs

best_confidence = 0.0            # track max seen, not final jitter

# Attractor break: same plan seen before + confidence good enough -> stop

if frozenset(plan) in seen_plans and best_confidence >= PLATEAU_THRESHOLD:

    confidence = best_confidence

    break

# Plateau exit: N consecutive non-improving steps >= 0.65 -> stop

if confidence == prev_confidence:

    plateau_count += 1

    if plateau_count >= PLATEAU_STEPS and best_confidence >= PLATEAU_THRESHOLD:

        break

“`

After the fix: the same IP query exits after **1 step** at confidence=0.80.

“`

[GraphOpsAgent] step 0 plan: [‘WINDOW 200ms’, ‘FOCUS ipv4’,

    “FILTER ip_address = ‘200.36.135.121’”, ‘ANALYZE fanin’]

[GraphOpsAgent] step 0 confidence=0.80 (best=0.80)

[GraphOpsAgent] confidence threshold reached at step 0

“`

Six round-trips reduced to one. The attractor loop is dead.

## Part 2 — GraphOps Autopilot (graphops_autopilot.py, 758 lines)

The copilot answers questions. The autopilot *asks them on its own*.

The design requirement: **LLM only runs when genuinely needed**. The classic failure mode of autonomous LLM systems is the alarm storm — the model fires on everything, floods the analyst queue with noise, and gets disabled within a week. The fix is deterministic anomaly detection *first*, LLM explanation *second*.

“`

kernel telemetry

      |

statistical detectors   <- DO NOT skip this

      |

LLM explanation layer   <- only when detectors agree

“`

### The Four-Tier System

“`python

TIER_OBSERVATION = 0.60   # internal log only

TIER_SUGGESTION  = 0.70   # suggestion queue (analyst pull)

TIER_ALERT       = 0.80   # auto-emit EventCard (no human needed)

TIER_AUTONOMOUS  = 0.90   # wake InvestigatorAgent (LLM runs)

“`

**AlertDedup** — blake2b hash of `(source_type, pattern, sorted_node_ids, window_bucket)` with 30-second TTL. `window_ms // 1000` buckets absorb inter-detector jitter without suppressing genuinely new events.

**EventCard** — the intelligence briefing object analysts actually want:

“`

GraphOps Alert

————————————————–

Pattern           botnet_coordination

Nodes             94

Window            180ms

Confidence        0.84

Temporal Sync     0.82

IP Entropy        6.91

Suggested Actions

  > Cluster timing

  > Analyze ASN diversity

  > Trace infrastructure paths

  > Compare with previous clusters

“`

**SentinelLoop** — subscribes to detector callbacks via `.subscribe(cb)`. Fully push-driven. At 20-60 million edges/sec, a polling loop would be a CPU disaster. The detector fires, the callback runs, the router decides. The 200ms heartbeat thread keeps the process alive and does nothing else.

**InvestigatorAgent** — Tier 3 only. Lazy-loads `GraphOpsAgent`. Builds a context-aware question from the `EventCard`, then runs the full investigation loop. A scanner card generates: *”Node X shows a degree delta of 60 connections in the last window. Pattern: scanner. Is this a port scan or lateral movement?”*

### Scoring Functions

The scoring encodes domain knowledge directly. For `FanInAlert`:

“`python

base = 0.92 if verdict == “botnet_coordination” else 0.72

penalty = min(0.15, timing_entropy * 0.10)

score = base – penalty

“`

– Botnet with timing_entropy=0.01 (near-perfect sync): score=0.919 -> **Tier 3 autonomous investigation**

– Fan-in spike with timing_entropy=2.0 (unsynchronised): score=0.57 -> **discarded**

That second case is intentional. Unsynchronised fan-in is a CDN, a flash crowd, or a viral tweet. Discarding it before it ever touches the dedup layer is correct.

### Self-Test Results: 24/24

“`

[PASS] EventCard format_card rendering

[PASS] AlertDedup TTL, key collision, expiry

[PASS] TierRouter all five thresholds

[PASS] Scoring: drift passthrough, botnet -> Tier 3, spike -> discard

[PASS] SentinelLoop: scanner queued, botnet dispatched, dedup suppression

[PASS] GraphOpsAutopilot start/stop lifecycle

“`

## Part 3 — Three Upgrades from Code Review

After reviewing the autopilot, the design doc came back with a code review — and it was genuinely useful. Here’s what it recommended and what we built.

### Upgrade 1 — Detector Cooldowns

A noisy detector can still produce high-frequency alerts even with dedup, because each new window produces new node IDs that hash differently. Solution: per-`(source_type, pattern)` cooldown timestamp.

“`python

DETECTOR_COOLDOWN = 1.0  # seconds

cooldown_key = (source_type, pattern)

if time.time() – self._last_emit.get(cooldown_key, 0.0) < DETECTOR_COOLDOWN:

    return  # suppress burst

self._last_emit[cooldown_key] = time.time()

“`

One line of state, one comparison, prevents alert bursts entirely.

### Upgrade 2 — HypergraphMetricsCollector

A shared `HypergraphSnapshot` computed once per window by a background thread:

“`python

@dataclass

class HypergraphSnapshot:

    cluster_density:   float  # edges / (nodes*(nodes-1))

    edge_churn:        float  # fraction of edges not seen in prev window

    temporal_sync:     float  # 1 – normalised inter-arrival timing entropy

    asn_entropy:       float  # H(src_node_id) across all edges

    top_fan_in_rate:   float  # max fan-in per dst in this window

    active_attractors: int    # live GraphAttractor cluster count

“`

All three detectors can reference this snapshot instead of recomputing the same Shannon entropy calculations independently. At high edge rates this matters.

### Upgrade 3 — GraphAttractorDetector (The Important One)

Fan-in detection and degree drift detection share a blind spot: they key on *nodes*. A VPN-rotating botnet changes its source nodes — IPs, ASNs, geos — constantly. By the time the detector builds state on one IP, it’s already gone.

**Graph attractors** key on *edge structure* instead. The observation: rotating botnets change their source nodes but they all hit the *same destination infrastructure*. The `edge_hash` is stable even when every `node_a` rotates.

“`python

# Tracks edge_hash -> {src_node_ids} over 5-second sliding windows

# Fires AttractorAlert when:

#   cluster_size           > 40    (enough distinct sources)

#   infrastructure_entropy > 0.8   (sources genuinely diverse, not one IP looping)

#   temporal_sync          > 0.75  (coordinated timing)

verdict = “rotating_botnet”

“`

To defeat this detector, an attacker would need to either:

1. Use a *different* target on every connection — which defeats the botnet’s actual objective

2. Desynchronise their task scheduling — which defeats coordination

Neither is viable. The attractor is the signal, and it’s extremely robust.

“`

ATTRACTOR [rotating_botnet] edge_hash=0x… cluster=94

infra_H=6.91 t_sync=0.82 score=0.917  ->  Tier 3

“`

## Part 4 — BPF Edge Aggregation (bpf_capture.c + capture.go)

The performance math:

“`

Raw packets:     100M/sec x 100B/event  = 10 GB/sec telemetry

Phase B edges:   100M/sec x 56B/event   = 5.6 GB/sec

Aggregated:      ~2M/sec  x 56B/event   = 112 MB/sec

“`

The edge aggregation trick reduces telemetry volume by ~98% before any data leaves the kernel.

**Kernel side** — new `edge_agg` LRU hash map (128K entries, keyed by `edge_hash`):

“`c

struct edge_counter {

    __u64 count;     // packet count for this edge_hash

    __u64 bytes;     // total bytes

    __u64 first_ts;  // ktime of first packet this flush window

    __u64 last_ts;   // ktime of most recent packet

};

“`

On every packet in the XDP path: if entry exists, atomic_add count and bytes. If not, insert. That’s all. No ringbuf write, no memory copy, no wakeup to userland.

**Go relay side** — new 50ms ticker goroutine added to `capture.go`:

“`go

ticker := time.NewTicker(50 * time.Millisecond)

// Every 50ms: iterate edge_agg map, emit frame 0x05, delete entries

“`

Fifty milliseconds of accumulation. At 100M pkt/sec, that’s 5 million packets compressed into however many distinct edge_hash values appear in that window — typically 1-5% of packet count.

## Part 5 — AlmaLinux 10 Kernel Tuning (scripts/tune_kernel.sh, 238 lines)

A properly-tuned kernel for XDP/eBPF looks very different from a default AlmaLinux install. The biggest wins:

“`bash

# Socket buffer — handle 128MB bursts without dropping

net.core.rmem_max = 134217728

# NIC -> kernel queue depth — don’t drop at the NIC

net.core.netdev_max_backlog = 300000

# Busy-poll — reduces latency on the XDP receive path

net.core.busy_read = 50    # us spin before sleeping

net.core.busy_poll = 50    # us spin on NIC before sleeping

# BPF JIT — without this BPF runs interpreted, 3-5x slower

net.core.bpf_jit_enable = 1

net.core.bpf_jit_harden = 0   # disable hardening in prod for speed

# Huge pages — ring buffer performance headroom

vm.nr_hugepages = 512          # 1 GB of 2MB pages

“`

The script also handles NIC IRQ affinity pinning (`–nic <iface>`): one XDP worker per RX queue, NIC queue N -> CPU N. Disables `irqbalance` so the manual pins hold. Sets CPU performance governor. `–dry-run` and `–revert` flags. Idempotent — safe to run repeatedly.

## The Full Stack After Today

“`

+——————————————————————+

|              AlmaLinux 10 · RTX 3060 · 62.8 GB                  |

+——————————————————————+

|  XDP / eBPF (bpf_capture.c)                                      |

|    +– flow_table     LRU hash 1M flows                          |

|    +– edge_agg       LRU hash 128K entries — 50ms flush  [NEW]  |

|    +– rb             ringbuf -> 0x00 FlowCore                   |

|    +– flow_events    ringbuf -> 0x02 FlowEnd                    |

|    +– edge_ticks     ringbuf -> 0x03 EdgeTick                   |

|    +– graph_edges    ringbuf -> 0x04 GraphEdge (OPEN/CLOSE)     |

+——————————————————————+

|  Go Relay (capture.go)                                           |

|    5 ringbuf drain goroutines                                     |

|    1 edge_agg flush ticker (50ms) -> 0x05 EdgeAgg        [NEW]  |

|    WebSocket broadcast                                            |

+——————————————————————+

|  Python Pipeline (stream_manager.py)                             |

|    topology_drift.py                                              |

|    +– TopologyDriftDetector      scanner / aggregator / lateral  |

|    +– TemporalFanInDetector      botnet timing sync              |

|    +– GraphAttractorDetector     rotating botnet         [NEW]  |

|    +– HypergraphMetricsCollector shared window metrics   [NEW]  |

+——————————————————————+

|  GraphOps Autopilot (graphops_autopilot.py)               [NEW]  |

|    SentinelLoop   push-driven, 3 detector subscriptions          |

|    AlertDedup     blake2b hash · 30s TTL                         |

|    TierRouter     0.60 / 0.70 / 0.80 / 0.90                      |

|    EventCard      analyst briefing object + format_card()        |

|    InvestigatorAgent   Tier 3 -> GraphOpsAgent                   |

+——————————————————————+

|  GraphOps Copilot (graphops_copilot.py)                   [NEW]  |

|    EntityExtractor       regex + EXAMPLE_VALUES guard            |

|    InvestigativeDSLExecutor   9 verbs, stateful                  |

|    GraphOpsAgent         attractor-proof loop · llama3.2:3b      |

+——————————————————————+

|  MCP Server — 11 tools total                                      |

|    graphops_investigate       natural language -> report          |

|    graphops_dsl_exec          raw DSL plan execution             |

|    graphops_entity_parse      entity extraction                  |

|    graphops_autopilot_status  runtime stats               [NEW]  |

|    graphops_suggestion_queue  Tier 1 queue                [NEW]  |

|    graphops_observation_log   Tier 0 log                  [NEW]  |

|    graphops_format_card       render event card           [NEW]  |

+——————————————————————+

“`

## By The Numbers

| File | Lines | Status |

|—|—|—|

| graphops_copilot.py | 1,380 | 5/5 tests pass · live Ollama validated |

| graphops_autopilot.py | 758 | 24/24 tests pass |

| topology_drift.py | 912 | 3 new classes · smoke tests pass |

| bpf_capture.c | 426 | pending make bpf |

| capture.go | 1,000 | pending go build |

| scripts/tune_kernel.sh | 238 | syntax valid |

**~2,800 lines of new code today across 6 files.**

All Python tests pass. BPF/Go changes pending hardware compile.

## What’s Next

**Immediate** — `FRAME_TAG_EDGE_AGG = 0x05` handler in `stream_manager.py` and `EdgeAggEvent` in `FlowCore.py`. Same pattern as all other frame types.

**Interesting** — Wire `HypergraphSnapshot` into `GraphAttractorDetector` so the shared metrics precomputation replaces the attractor’s own per-window entropy calls.

**Exciting** — In-kernel botnet detection. Move fan-in counters and temporal burst tracking into BPF maps. Emit `BOTNET_CLUSTER` events directly from the XDP path. The kernel becomes the first anomaly detector. Userland receives only high-value signals. That’s the architecture telecom operators quietly run for abuse detection at hyperscale.

**Wild** — Hypergraph spectral fingerprinting. Compute eigenvalue signatures of traffic clusters. Detects VPN-rotating botnets even when every node and IP changes every minute, by analysing the structural fingerprint of the traffic graph rather than any individual node property.

*NerfEngine is a personal research project combining eBPF/XDP network telemetry, hypergraph analysis, and autonomous LLM-backed threat detection. The architecture described here — kernel-side graph primitive emission, timing-entropy botnet detection, graph attractor analysis, and attractor-proof LLM agent loops — represents genuinely novel combinations of techniques that don’t appear together elsewhere in the open literature. Building it has been one of the most interesting engineering exercises I’ve done.*

*More updates as the build continues. The kernel is listening.*

Blog post written — 380 lines at /home/spectrcyde/NerfEngine/Update Blog Post 03132026.md.

It covers:

  1. GraphOps Copilot — the attractor loop bug, the fix, live test results against llama3.2:3b
  2. GraphOps Autopilot — 4-tier system, AlertDedup, SentinelLoop, scoring logic, 24/24 pass
  3. Three upgrades — cooldowns, HypergraphMetricsCollector, GraphAttractorDetector (the rotating-botnet detector)
  4. BPF edge aggregation — 98% telemetry reduction, kernel + Go relay design
  5. AlmaLinux 10 tuning script — key sysctl values, IRQ pinning, JIT config
  6. Full stack diagram, by-the-numbers table, and what’s next

Extra Special Thanks to ChatGPT!!!

https://chatgpt.com/share/69b3a327-a4c4-8011-a9d0-d82cfc8804ed

Leave a Reply

Your email address will not be published. Required fields are marked *