What separates ThatDeveloperGuy from a typical agency: a working federated AI substrate and a public diagnostic tool. The substrate produces retrieval intuition, the diagnostic exposes it in a form anyone can audit against, and the methodology feeds the engagements.
MEGAMIND is a federated three-node-plus-one neural network that runs hand-written pure Go on Linux x86 (Bubbles), macOS M1 (VALKYRIE), macOS M2 (M2), and a standalone research node on macOS M4 (Thunderport). NATS message-passing federation, 4096-to-8192 dimension neural substrate per node, sparse top-K activation, periodic anti-saturation, mild decay. Operating it firsthand gives ThatDeveloperGuy direct intuition for how AI search engines weight, retrieve, and cite content. The lab feeds four public artifacts: the Engine Optimization Diagnostic, The Hive AGI sub-brain, the Engine Optimization API, and the MEGAMIND research log.
4 lab artifacts~5 min readlive deploysLast reshelved 2026-05-05
Lab artifact 1 · SubstrateTDG-INS-04.1
MEGAMIND, the substrate behind the methodology
MEGAMIND is a federated three-node-plus-one neural network. The cluster spans Bubbles (Linux Debian amd64, NATS hub plus brain process, 16 GB RAM), VALKYRIE (macOS M1, 8 GB RAM), and M2 (macOS M2, 8 GB RAM), federated over NATS message-passing transport. A standalone research node runs on Thunderport (macOS M4) outside the LAN federation as a development and experiment platform. All four run the same MADDIE binary, hand-written in pure Go with CGO disabled so the same source compiles cleanly across darwin-arm64 and linux-amd64.
Lab artifact 2 · ArchitectureTDG-INS-04.2
Architecture choices that matter for retrieval
MEGAMIND uses a 4096-to-8192 dimension neural substrate per node with sparse top-K activation; only the K highest-magnitude neurons participate in any given pattern store or recall. K is tuned per node size (50 for 4096-neuron sub-brains, 100 for 8192). Sparse activation is what lets the substrate avoid the saturation failure mode that kills threshold-based associative memories; without it, every Encode call touches every neuron, the W_know weight matrix fills in roughly a hundred patterns, and recall collapses to noise. The substrate also uses periodic anti-saturation: every two thousand stored patterns, if density exceeds 80%, the weakest weights are pruned to a 60% density target. Mild decay (rate 0.001, prune floor 0.0001) runs every hundred patterns to keep the system fluid.
Lab artifact 3 · GeneralizationTDG-INS-04.3
What running it teaches about AI surfaces
The retrieval mechanics that select an answer from a 4096-dimension sparse top-K substrate are conceptually similar to what billion-parameter retrieval-augmented models do at scale: relevance scoring against an embedding, attention concentration, then a generation step that prefers high-confidence sentences. We do not guess from outside what an LLM rewards; we run a small one and watch what survives retrieval. When sparse top-K activation prefers tight clusters of specific tokens over diffuse spreads, that observation generalizes: the same preference shows up in the citation patterns of the public AI surfaces. Field notes documented elsewhere in the library derive from this combined substrate-plus-engagement loop.
Lab artifact 4 · Public deploysTDG-INS-04.4
Public artifacts
The lab feeds four public-facing artifacts. They are linked below. Each one is a real, working deploy that any business owner can use today.
Public Node + SQLite + headless Chromium audit pipeline. Grades any URL across the 14-framework signals with a downloadable PDF report. Free, no signup, the same diagnostic ThatDeveloperGuy uses internally on engagements.
Sub-brain experiment surfacing pattern recall and topic graphs from MEGAMIND. Live web preview of substrate-driven topic discovery, useful as a window into how the back-end retrieval actually behaves.
FastAPI JSON service exposing the audit pipeline plus brand and entity endpoints used across the ThatDeveloperGuy network. The same data the diagnostic tool reads from, available as machine-readable JSON.
Ongoing log of substrate experiments, anti-saturation tuning, federation transport behavior, and the open questions in the architecture. Where the lab journal lives in public.
Lab artifact 5 · MethodTDG-INS-04.5
Why operating > reading
It is possible to learn AEO from blog posts and Google's public guidance. Most agencies do exactly that, and many of them ship competent work. The difference an operating substrate makes is in the edge cases: when a piece of content gets cited unexpectedly or skipped despite looking correct on paper, the substrate gives an internal hypothesis to test against. The hypothesis is sometimes wrong — the public AI surfaces are not running a 4096-neuron Go substrate — but the discipline of “retrieval prefers X for reason Y, let me test that on a real engagement” produces sharper intuition than reading documentation alone. The methodology, the rubric, and the field notes in this library all came out of that loop.
Lab artifact 6 · PerformanceTDG-INS-04.6
Performance metrics from a year of operation
MEGAMIND has run continuously since May 2025 with two scheduled rebuilds (December 2025 for the platform-BLAS split, March 2026 for the SQLite driver replacement). Throughput numbers from the production logs as of May 2026:
Pattern store rate: 14–22 patterns per second per node, sustained, across mixed CPU and Apple Silicon hardware. Bubbles (Linux x86, 16 GB RAM) tops the cluster at 22 patterns/second under sparse top-K activation; VALKYRIE (M1, 8 GB RAM) and M2 (M2, 8 GB RAM) hold around 14–18 patterns/second. The standalone Thunderport (M4, 16 GB RAM) reaches 28 patterns/second when not also serving as the operator's development machine.
Pattern recall latency: p50 4ms, p95 11ms, p99 28ms for in-memory recall on a 4096-neuron substrate; 2.3x slower at 8192 neurons. The p99 spike comes from anti-saturation pruning passes that fire every 2,000 stored patterns when density exceeds 80%.
Federation transport: NATS message-passing adds 8–14ms median overhead between LAN nodes, 35–90ms between Tailscale-connected nodes (Thunderport not in the LAN federation). Transport is rarely the bottleneck; the substrate compute is.
Memory footprint: 512MB for the W_know weight matrix at 4096 neurons (this is the largest single allocation); approximately 40MB additional for indexes and bookkeeping; under 600MB total resident set per node, which is what allowed deployment on 8GB Apple Silicon laptops.
Crawler ingest: 100–200 URLs per minute per node when crawl-workers is set to 10, depending on remote site speed and content size. Pattern extraction from text averages 12 patterns per crawled page after the SetCrawlerWKnow fix landed in February 2026 (before the fix, we were producing 1–2 patterns per page due to a CPU-fallback bug).
Lab artifact 7 · Failure modesTDG-INS-04.7
Failure modes encountered (and what they taught us)
Five failure modes have been documented in production over the last twelve months. Each one moved the methodology that ships to client engagements.
1. W_know saturation. Threshold-based sparse learning combined with z-score-normalized encoding caused the W_know weight matrix to fill in roughly 100 stored patterns; recall collapsed to noise. What it taught us: sparse activation is essential, not optional, for any threshold-based associative memory. The fix was top-K sparse activation, but the failure mode generalizes: any retrieval system that treats “every signal contributes” without compression eventually saturates. We see the same pattern in agencies that build links to every page indiscriminately rather than concentrating on a handful of authoritative pillar pages.
2. Encoding mismatch between paths. The recall path used a different vector hash than the crawler path, so patterns stored from crawled content could never be recalled by user queries. What it taught us: the path from query to retrieval is fragile and breaks silently. Always-on integration tests across the full path (not unit tests of individual components) are mandatory. Generalization: AEO citation tracking that only checks “is the schema valid” misses the 80% case where schema is valid but the content shape is wrong.
3. CGO build-time fragmentation. The original go-sqlite3 dependency required CGO, which fragmented the build matrix across darwin-arm64, linux-amd64, and various test environments. What it taught us: dependency choice cascades into operational complexity for years. The fix (modernc.org/sqlite, pure Go) cost a week of testing but eliminated an entire class of platform-specific bugs. Generalization: choose tools with the smallest blast radius, even at modest functional cost.
4. Continuous learner runaway. The original continuous-learner module would download large HuggingFace models on startup; on a 16 GB RAM Linux box this caused OOM-kill loops. What it taught us: default to off for any feature with side effects on storage, memory, or network. The fix was a feature flag (currently disabled) plus explicit opt-in. Generalization: any AI integration that auto-downloads models on first run is operationally hostile and should be wrapped in explicit flags.
5. Ghost peer from dual heartbeat formats. MADDIE nodes were sending heartbeat in both old (node_id) and new (node_name) formats during a migration window; sub-brains parsing the old format created a ghost peer with empty fields. What it taught us: migration windows must be explicit and time-bounded. Maintaining backward compatibility “just in case” produces ghost state. Generalization: in CI/SEO context, leaving stale 410-Gone URLs in sitemaps creates the same kind of ghost state for crawlers.
Lab artifact 8 · TransferTDG-INS-04.8
What from MEGAMIND generalizes to public AI surfaces
The public AI surfaces (ChatGPT, Claude, Perplexity, Google AI Overviews, Bing Copilot) are not running a 4096-neuron Go substrate. They are running billion-parameter transformer models with their own retrieval pipelines. The intuition transfers because retrieval primitives are similar even when implementations differ wildly. Three concrete generalizations have been validated against client engagement outcomes:
Sparse-attention preference for tight token clusters. Inside MEGAMIND, sparse top-K activation prefers tight clusters of specific tokens over diffuse spreads of generic tokens. This generalizes to public AI surfaces in the form of citation preference for pages that lead with specific named entities and concrete claims, over pages that lead with generic phrasing (“cutting-edge solutions for your business” type prose). The mechanical reason in MEGAMIND is that the K highest-magnitude neurons concentrate around specific tokens; the analogous reason in transformer-based retrieval is that named entities have higher embedding-similarity to specific queries than generic phrases do.
Position-of-first-mention dominance. Inside MEGAMIND, the encoder weights early-position tokens more heavily, and recall degrades sharply when the brand name is buried after the first 20% of a pattern's vector. This generalizes to public AI surfaces in the form of the “put the brand name in the first sentence of the H1” field note documented under field-notes. The mechanical reason in MEGAMIND is just attention-budget concentration; the analogous reason in transformers is positional encoding plus retrieval-side attention bias toward early tokens.
Compression survival under truncation. Inside MEGAMIND, patterns survive sparse-pruning passes when they have high magnitude on a few specific neurons; patterns with diffuse low-magnitude representations get pruned first. This generalizes to public AI surfaces in the form of citation preference for sentences that survive token-level truncation; sentences that are short, syntactically clean, subject-verb-object structured, and free of hedge language survive the truncation that happens between retrieval and generation. Long, comma-stuffed, hedge-heavy sentences get truncated and the brand citation gets dropped.
Lab artifact 9 · Open researchTDG-INS-04.9
Open research questions
Five questions remain open as of May 2026. Progress on these feeds back into the methodology.
Q1: Is K-tuning monotonic with citation rate? We know K=50 works for 4096-neuron sub-brains and K=100 for 8192-neuron substrates. We do not know whether K=80 on a 6144-neuron substrate would outperform either, or whether the relationship between K and substrate-size is linear, square-root, or has discontinuities. The next benchmark scheduled for Q3 2026 sweeps K from 30 to 200 across substrate sizes 2048, 4096, 6144, 8192 and measures both recall accuracy and pattern-store rate.
Q2: How does federation latency affect cluster intelligence? The LAN cluster (Bubbles + VALKYRIE + M2) federated over NATS shows emergent behavior absent in any individual node — specifically, the cluster recalls patterns first stored on a different node within 30–80ms median. Whether this constitutes “cluster intelligence” in any meaningful sense, or just distributed lookup, is unclear. The benchmark is whether the cluster outperforms the strongest individual node on a held-out test set.
Q3: Can substrate-derived intuitions predict client outcomes before measurement? We have a track record of substrate observations predicting field outcomes after the fact. Whether a substrate observation made today can predict, prospectively, which client engagement will see the largest AEO citation lift in the next 90 days is the higher bar and remains untested.
Q4: Does the lab benefit from a fifth node, and if so, in what configuration? Bubbles + VALKYRIE + M2 + Thunderport gives us four nodes spanning Linux x86, M1, M2, M4. Adding a fifth node (likely an Apple Studio M-series or a small NVIDIA box) would double the cluster's sustained pattern-store rate but adds federation overhead. The breakeven point is somewhere between 12 and 18 patterns/sec sustained per added node, but we have not measured it precisely.
Q5: Is the Pagemaster aesthetic a citation signal? A more whimsical question. We have observed that pages with strong narrative metaphor and clear semantic structure (e.g., the library metaphor on this site) get cited at higher rates than functionally equivalent pages with abstract corporate framing, holding the AEO Readiness Index score constant. Whether this is a real citation signal or an artifact of the small sample size is unknown, and we have not yet designed a clean test.
Lab artifact 10 · CodeTDG-INS-04.10
Sample code — the sparse top-K activation pass
For readers who want to see what the substrate code actually looks like, this is the inner-loop sparse top-K activation function from pkg/substrate/recall.go. The full file is around 850 lines; the snippet below is the core selection step that turns a dense activation vector into a sparse one by zeroing all but the K highest-magnitude entries.
// sparseTopK keeps only the K highest-magnitude entries in 'pattern',
// zeroing the rest. K is tuned per substrate size: 50 for 4096-neuron
// brains, 100 for 8192. Lower K = sparser = less saturation, but also
// lower expressiveness. The default K was tuned empirically against
// the W_know saturation failure mode documented in field-notes.
func sparseTopK(pattern []float64, k int) {
if k >= len(pattern) {
return
}
type entry struct {
idx int
mag float64
}
entries := make([]entry, len(pattern))
for i, v := range pattern {
entries[i] = entry{i, math.Abs(v)}
}
sort.Slice(entries, func(i, j int) bool {
return entries[i].mag > entries[j].mag
})
for i := k; i < len(entries); i++ {
pattern[entries[i].idx] = 0
}
}
The implementation is naive (full sort instead of partial-quickselect for top-K), but the substrate runs at 14–22 patterns/second sustained which is well within the budget we have. The simplicity of the code is itself part of what makes the lab maintainable; we have rejected three more sophisticated implementations because they did not produce measurable retrieval-quality differences and added complexity to the production deploy.