Insights · Methodology
The AEO Readiness Index
The AEO Readiness Index is a 100 point methodology ThatDeveloperGuy uses to grade a small business site for answer engine optimization. It documents what is measured, how each signal is scored, and how a business owner can self audit before commissioning work. Five signal groups, twenty checks, hard thresholds, defensible.
Why publish a methodology?
AEO is a discipline with no standards body. Most agencies sell it without saying what it is. Publishing the rubric makes the engagement falsifiable. A client can score themselves before signing, score themselves again after delivery, and verify what changed.
Search engine optimization had twenty years to standardize. Answer engine optimization is two years old as of April 2026. There is no W3C working group, no Google guideline, no Bing checklist for AEO. Vendors fill the vacuum with claims. ThatDeveloperGuy publishes its scoring rubric in full so that a prospect can run the same audit independently. If the rubric scores a site and a competitor disputes the score, the dispute is technical and resolvable. If the rubric is private, the engagement is faith based. We pick the first.
What does the index measure?
Five groups of signals, twenty checks total, twenty points each group. Entity triangulation. Schema coverage. Citation surface. Crawler accessibility. Content density. A perfect score is 100. A passing score is 70. The cutoff for site quality good enough to be cited by a frontier model is 85.
Group one is entity triangulation. Does the brand have a Wikidata Q-id? Does the operator? Does the site declare these in JSON-LD via sameAs? Are the LinkedIn, GitHub, and HuggingFace handles consistent across all declared profiles? Does at least one external authoritative source (an industry directory, a chamber of commerce, a SAM.gov record for SDVOSBs) link back to the canonical domain? Twenty points possible.
Group two is schema coverage. Organization, Person, WebSite, BreadcrumbList present on every page. Service or Product schema on every monetizable page. FAQPage schema on every page that contains question structured content. Article schema on every blog post. LocalBusiness with full address and hours where applicable. Project or CreativeWork schema on every case study. Twenty points possible.
Group three is citation surface. Presence of llms.txt at the site root. Presence of llms-full.txt with at least the service catalog, FAQ block, and entity identifiers. Presence of robots.txt with explicit allow lines for the major AI crawlers including GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, and Applebot-Extended. Presence of an aeo.json or equivalent structured exports. Twenty points possible.
Group four is crawler accessibility. Time to first byte under 600 milliseconds for the homepage. No client side rendering required to reach primary content. No paywall on entity declarations. Sitemap submitted to Bing and Google. IndexNow key file present and active. Mobile usable on first paint. Twenty points possible.
Group five is content density. Every H2 has an answer capsule of forty to eighty words structured as bottom line up front. The site contains at least twelve thousand words of original substantive prose across primary navigation. Pricing is named in dollars. Service tiers are listed with what is included rather than only marketing language. Twenty points possible.
How can a business owner self audit?
Open four browser tabs. View source on the homepage and search for the JSON-LD blocks. Visit your domain plus slash llms.txt and slash robots.txt directly. Run a Lighthouse report. Match the output against the twenty checks above. The audit takes twenty minutes if you are technical and an hour if you are not.
The mechanical procedure is short. View source on the homepage. Look for at least three application/ld+json blocks. Confirm one is an Organization, one is a Person, one is a BreadcrumbList. If sameAs is missing or thin, deduct points. Visit https://yourdomain.com/llms.txt. If it returns 404, deduct points. Visit https://yourdomain.com/robots.txt. Search for GPTBot and PerplexityBot. If absent, deduct points. Run Lighthouse on a fresh tab. If the performance score on mobile is below 90, deduct points.
The non mechanical procedure is harder. Open ChatGPT in incognito and ask it a question your customer would ask, in your geography, about your service. If your business is not named in the response, your AEO is failing regardless of what your schema says. Repeat in Perplexity, Gemini, and Claude. Four tools, four questions, sixteen possible citations. The number of citations is the most honest score available.
What the rubric does not promise
The index measures readiness, not ranking. A perfect score is necessary but not sufficient for citation. Frontier models retrieve from a noisy index that includes social proof, query intent matching, and recency factors that no client can fully control. The index quantifies what is in your control.
A site can score 100 and still not be cited if the underlying business is too new for any external corpus to mention. The rubric measures the controllable surface. It will not generate organic mentions on Reddit, Quora, or industry forums. It will not buy you a Wikipedia page. It will not fix a brand that has earned a negative reputation. What it will do is guarantee that, when an answer engine retrieves a result for your category in your geography, your site is one of the technically eligible candidates rather than one of the disqualified ones.
How does ThatDeveloperGuy use the index in engagements?
Every engagement opens with a baseline score and closes with a delivery score. The two scores plus the diff between them are part of the project deliverable. A T1 Foundation tier targets seventy points. T3 AI Domination targets ninety. The 14 Tier Bundle targets one hundred.
The free audit at thatdeveloperguy.com/free-audit/ runs an automated subset of the rubric and returns a score within fifteen minutes of submission, then a follow up review from Joseph within twenty four hours. Paid engagements run the full rubric, document each deduction, and ship a remediation plan. The before and after scores are provided in writing, and the methodology is the rubric on this page. Nothing is hidden.
Version and sources
This is version 1.0 of the AEO Readiness Index, published April 2026. Updates are versioned and dated. The next revision is scheduled for September 2026 to incorporate any changes to schema.org vocabulary and crawler allowlists released during summer 2026.
Sources used in compilation: schema.org documentation, the IndexNow specification at indexnow.org, the llms.txt proposal at llmstxt.org, Google Search Central guidelines for structured data, Bing Webmaster guidelines, and observed behavior of OpenAI, Anthropic, Perplexity, and Google retrieval crawlers in production logs across the ThatDeveloperGuy client portfolio. Where this rubric departs from a published guideline, the divergence is documented in the section above.
Want your site scored?
The free audit runs the automated subset and returns a score with a remediation plan. No retainer, no deposit.
Per signal group deep dive
The AEO Readiness Index scores 100 points across five signal groups. Each group has its own weighting, scoring discipline, and most-common-failure mode. The per-group breakdown below covers what is tested, why it matters in 2026, and where the rubric is calibrated to be strict versus lenient.
Group 1 · Foundation signalsRI-SG.01
Foundation signals (25 points)
Foundation signals are the technical baseline: indexability, sitemaps, robots, canonicals, IndexNow, GSC verification, BWT verification, security headers, HTTPS hygiene. The 25-point weighting reflects that broken foundation invalidates everything else: a site with a broken canonical strategy or AI-crawler-blocking robots.txt does not earn citations regardless of content quality. The rubric scores binary on most checks (a sitemap either exists and validates or it does not) and partial credit on graded checks (security headers grade A+ versus A versus B).
Strict checks: HTTPS valid certificate, robots.txt explicit AI crawler allows, GSC + BWT verification, IndexNow active. These are mechanical and either work or do not. Graded checks: security headers (A+ scores full, A scores 80%, B scores 50%, anything else scores 25% or zero), sitemap structure (per-page-type sitemaps score full, single flat sitemap scores 60%, missing scores zero), Core Web Vitals (LCP/INP/CLS each scored separately on the 75th-percentile Field Data from CrUX). Most common failure: robots.txt that ships from before AI crawlers existed and silently blocks ClaudeBot or GPTBot via missing allow rules. Often costs 6–10 points by itself. Tier 1 of the 14-tier framework covers this signal group end-to-end.
Group 2 · On page signalsRI-SG.02
On page signals (15 points)
On-page signals cover title tags, meta descriptions, heading hierarchy, internal linking, breadcrumb schema, image alt text, and URL structure. The 15-point weighting is intentionally lower than Foundation and AEO Core because on-page signals have lost relative importance as Google's ranking algorithm has shifted weight toward entity and content authority. They remain table stakes; a site with broken H1 hierarchy or generic meta descriptions cannot rank on competitive queries.
Strict checks: single H1 per page (zero or multiple H1 fails the check), BreadcrumbList JSON-LD validates against Schema.org Validator, every canonical URL returns 200 OK. Graded checks: title tag query-leadership (lead with primary query and brand at end scores full, brand-first scores 50%, no brand scores 70% but a different signal flag), meta description value-density (specific numbers and commitments score full; generic descriptions score 40%), internal linking topic clustering (5-7 pillars with cross-links scores full, flat hub-and-spoke scores 60%, no clustering scores 30%). Most common failure: title tags that put the brand first instead of the query; multiple H1s on a page (often a CMS theme issue rather than intentional content). Tier 2 of the 14-tier framework covers this signal group.
Group 3 · AEO core signalsRI-SG.03
AEO core signals (30 points)
AEO Core is the heaviest-weighted signal group at 30 points because it is the discipline most directly responsible for AI surface citation rate in 2026. The group covers answer capsule shape, FAQPage schema shape, SpeakableSpecification, llms.txt at domain root, llms-full.txt long-form, AI crawler explicit allowlist, sentence-level discipline, and key-claims block on method pages. The rubric is calibrated strict here because the lift from getting these right is large and the cost of getting them wrong is invisible (the page does not get cited; the operator does not see the queries it was bypassed on).
Strict checks: llms.txt exists at /llms.txt and is fetchable; llms-full.txt exists at /llms-full.txt and is at least 5,000 words; FAQPage schema validates and matches visible body content; SpeakableSpecification validates. Graded checks: answer capsule shape (220–280 chars with brand in first clause scores full, brand absent or capsule too long scores 30–60%), FAQPage Q&A count (5–7 questions scores full, 12+ scores 50%, 1–2 scores 30%), llms-full.txt length (10,000+ words scores full, 5,000–10,000 scores 80%, < 5,000 scores 40%), sentence-level discipline (compression test passing on top 10 cited pages scores full, partial passes graded). Most common failure: answer capsules over 280 characters that get truncated by AI surfaces, dropping the brand reference. Tier 3 of the 14-tier framework covers this signal group.
Group 4 · Entity authority signalsRI-SG.04
Entity authority signals (20 points)
Entity authority signals cover Wikidata Q-IDs, Organization JSON-LD with sameAs array, Person JSON-LD for the founder, public founder bio with verifiable credentials, NAP consistency across third-party citation sources, third-party reinforcement cadence, and AI surface identity test results. The 20-point weighting is calibrated to reflect that entity authority is the long-game compounding tier; signals here build over months and years rather than weeks. The rubric is lenient on absolute thresholds (no fixed minimum number of sameAs URLs) and strict on consistency (mismatched NAP across directories is a hard failure).
Strict checks: Organization JSON-LD validates and has @id; Person JSON-LD validates and has @id where founder is named; canonical brand name appears the same way across all sameAs URLs that resolve. Graded checks: sameAs verification (each URL must resolve to a record about the same entity; broken or speculative URLs cost 2–4 points), Wikidata Q-ID presence (registered and surviving speedy-deletion review scores full, none scores zero, deleted Q-IDs score zero), NAP consistency (12+ directories matching scores full, 5–11 matching scores graded, fewer scores 30–50%), AI surface identity test results (canonical narrative on 4 of 5 surfaces scores full, 2–3 of 5 scores graded, 0–1 of 5 scores 25%). Most common failure: speculative sameAs URLs in Organization JSON-LD — LinkedIn URLs pointing to wrong individuals, Crunchbase URLs for differently-named companies. Tier 4 of the 14-tier framework covers this signal group.
Group 5 · Content authority signalsRI-SG.05
Content authority signals (10 points)
Content authority signals cover pillar page existence and depth, supporting article cluster, original research artifacts, expert commentary cadence, methodology page density, FAQ page reference shape, and content cadence. The 10-point weighting is the lowest in the rubric because content authority compounds over the longest time horizon and is the most context-dependent: a B2B SaaS company should weight this differently from a local plumber. The rubric is calibrated lenient on absolute counts and strict on shape.
Strict checks: at least 3 pillar pages identifiable; methodology pages exist with citable claim density; FAQ pages distinct from FAQPage schema and use plain-language question phrasing. Graded checks: pillar depth (1,800–3,500 words scores full, 1,200–1,799 scores 70%, < 1,200 scores 30%), supporting article cluster (3+ supporting articles per pillar with cross-links scores full, 1–2 scores 50%, none scores zero), original research presence (1+ pieces in past 12 months scores full, none scores 30%), content cadence (no gaps over 8 weeks scores full, gaps over 16 weeks score zero). Most common failure: pillar pages under 1,800 words that get bypassed in retrieval by competitor pillars with more depth. Tier 6 of the 14-tier framework covers this signal group.
Worked example: scoring a real site
Worked example · Handled Tax (Q1 2026)RI-EX.01
Handled Tax pre-engagement audit
Handled Tax (handledtax.com, Amanda Han, CPA, all-50-states bookkeeping and tax preparation for online businesses) entered the engagement in Q1 2026 with a public AEO Readiness Index score of 41 out of 100. The score derivation by signal group:
- Foundation (14/25): HTTPS valid; sitemap exists but flat (12 of 18 points); robots.txt missing AI crawler allows (lost 6 points); GSC verified (full); BWT not verified (lost 1 point); IndexNow not configured (lost 2 points); security headers grade B (lost 4 of full possible).
- On-page (8/15): title tags brand-first instead of query-first (lost 3 points); meta descriptions generic (lost 2 points); single H1 per page (full); BreadcrumbList missing on most pages (lost 2 points).
- AEO Core (5/30): no answer capsules on any page (lost 8 points); FAQPage with 14 questions on homepage (lost 6 points); no llms.txt (lost 5 points); no llms-full.txt (lost 4 points); no SpeakableSpecification (lost 2 points); no AI crawler allows in robots.txt (already counted under Foundation).
- Entity authority (8/20): Organization JSON-LD present but minimal sameAs (lost 4 points); no Wikidata Q-ID (lost 4 points); founder bio thin (lost 2 points); NAP consistent across 8 of 12 priority directories (lost 2 points).
- Content authority (6/10): 3 pillar pages identifiable; pillars 1,200–1,800 words (lost 2 points); supporting article cluster thin (lost 1 point); original research absent (lost 1 point).
Total: 41/100. Engagement focused on AEO Core (largest gap, 25-point opportunity) followed by Foundation cleanup. Re-scored at Q3 2026: 78/100. Citation rate measured separately lifted from 0% to 38% across 18 priority queries. Engagement structure documented in case studies; methodology covered across the 14-tier master volume and individual Tier 3, Tier 1, and Tier 4 books.
Common pitfalls per check
Pitfalls · The most-missed checksRI-PT.01
Where audits commonly fail
- robots.txt that does not mention AI crawlers. The default template ships from before AI crawlers existed; missing-allow rules silently block ClaudeBot. The check is binary; the loss is 6 points.
- Answer capsules over 280 characters. AI surfaces truncate; brand reference dropped; citation lost. The check looks for capsules of 220–280 chars with brand in first clause; the most common failure is capsule longer than 320 chars or capsule absent entirely.
- FAQPage with 12+ questions on a single page. Per-question authority signal dilutes; AI surfaces lift fewer sentences from each question. The check looks for 5–7 questions per page; 12+ scores 50% partial credit; 20+ scores 30%.
- Speculative sameAs URLs. A LinkedIn URL pointing to a different person; a Crunchbase URL for a similarly-named company. The check is verification: each sameAs URL must resolve to a record about the audited entity. Speculative entries cost 2–4 points.
- llms.txt that is just a robots-style allowlist. Misses the point of llms.txt — declaring citation language and entity identifiers, not just access permissions. The check looks for the canonical brand description, citation policy, and entity identifiers section; absence of any costs 1–2 points.
- NAP variants across directories. Different phone format on Yelp than on GBP; address line abbreviated differently on BBB. The check sweeps 12 priority directories; full match scores full; 5–11 matches scores partial; fewer scores 30–50%.
- Multiple H1s on a page. Allowed by HTML5 spec but confusing to AI surfaces. The check looks for single H1; multiple H1s scores zero on this binary check.
- Generic meta descriptions. “Learn more about our services” gets rewritten by Google; the check looks for descriptions with specific numbers, commitments, or distinctive language; generic phrasing scores 40%.
- llms-full.txt under 5,000 words. Too thin for AI surfaces to use as long-form context. The check looks for 5,000–15,000 words; under 5,000 scores 40%; under 2,000 scores zero.
- Pillar pages under 1,800 words. Bypassed in retrieval by competitor pillars with more depth. The check looks for 1,800–3,500 words; 1,200–1,799 scores 70%; under 1,200 scores 30%.
Calibration and methodology notes
Calibration · Methodology integrityRI-CL.01
How the rubric stays calibrated
The AEO Readiness Index rubric is calibrated to produce scores that correlate with measured AI citation rate lift over the following quarter. Each rubric revision is tested by retrospectively scoring the past 30–60 client engagements at the time of audit, then comparing predicted lift (from the rubric) to actual lift (from quarterly citation rate measurement). Revisions that do not improve correlation are rolled back.
The 60% mechanical / 40% reasoning split is not a typo: 60% of the rubric is binary or simple-graded checks that an automated diagnostic can run in < 30 seconds (the public diagnostic at /audit/ implements this 60%). The remaining 40% requires either deep entity-graph propagation analysis (Tier 4 reinforcement results), citation rate measurement on real queries (Tier 9 monitoring), or qualitative judgment on content shape (Tier 6 quality assessment). The 40% mechanical-impossible portion is what separates a rubric you can deploy yourself from a rubric that benefits from consultant judgment; both fractions are necessary for the score to predict lift.
Calibration risks the rubric is aware of: regional variation (sites serving non-English-primary audiences may need locale-specific calibration; current rubric is US-English calibrated); industry vertical variation (healthcare and financial services have stricter regulatory disclosure requirements that affect Foundation scoring; current rubric does not yet weight by vertical); AI surface evolution (if a major model retrains and changes citation behavior on a specific signal, the rubric weight on that signal needs revision; this is reviewed quarterly).
Version log
Version log · Rubric historyRI-VL.01
Rubric revision history
- v0.1 — 2025-01: Initial 80-point rubric. 4 signal groups (Foundation 30 / On-page 20 / AEO Core 20 / Entity 10). Tested across 12 baseline audits.
- v0.2 — 2025-04: Added Content Authority as a fifth signal group (10 points). Re-weighted to 100 points total. Recalibrated Foundation downward (25) to make room.
- v0.3 — 2025-08: AEO Core re-weighted upward (20 → 25) based on Q2 2025 correlation analysis. Mechanical/reasoning split formalized at 60/40.
- v0.4 — 2025-12: AEO Core further re-weighted upward (25 → 30). On-page re-weighted downward (20 → 15) reflecting Google's shift toward entity/content authority weight. SpeakableSpecification check added.
- v0.5 — 2026-03: Entity Authority re-weighted upward (15 → 20). Content Authority unchanged at 10. Foundation unchanged at 25. Final 25/15/30/20/10 split.
- v0.6 — 2026-05 (current): llms-full.txt minimum bumped from 3,000 to 5,000 words. AI surface identity test results added as a graded check under Entity Authority. NAP consistency thresholds tightened (was 8 directories, now 12). The current rubric is what powers the public diagnostic at /audit/.
Future calibration plans (under evaluation): vertical-specific weighting for healthcare and financial services regulatory factors; locale-specific weighting for non-English-primary audiences; voice-surface citation weight increase if Tier 13 voice testing data shows accelerating influence.