llms.txt
llms.txt is a proposed standard markdown file placed at the root of a website that tells LLMs and AI agents which pages on the site are canonical for citation. Proposed by Jeremy Howard in 2024 as the AI-era equivalent of robots.txt + sitemap.xml for retrieval-augmented generation systems.
Also called: llms.txt file, AI citation manifest · Last updated: May 27, 2026 · By Joseph W. Anady
Why it matters.
The llms.txt convention puts a single file at https://yoursite.com/llms.txt containing a curated index of your most important URLs with brief markdown descriptions. The proposal mirrors the simplicity of robots.txt: humans can read it, machines can parse it, no special parser required. An extended llms-full.txt format provides full content for AI ingestion.
How it works.
An llms.txt file uses a minimal markdown structure: H1 with site name, optional blockquote with description, H2 sections grouping canonical URLs as markdown links with brief explanations. The AI agent or LLM crawler fetches the file, parses the markdown, and uses it to prioritize which pages on the site to ingest or cite. The format is intentionally human-readable so site owners can maintain it directly.
2026 reality check.
llms.txt does NOT drive direct LLM citation in May 2026. The major AI engines (ChatGPT, Perplexity, Claude, Gemini) do not consume the file as input for RAG. Where llms.txt does work is the emerging agentic web layer: AI coding assistants (Claude Code, Cursor, GitHub Copilot) fetch it constantly when working with documentation sites. The Model Context Protocol (MCP) ecosystem increasingly references it. Treat llms.txt as future-proofing infrastructure, not current citation lift.
Data points
- Proposed by Jeremy Howard (Answer.ai) in September 2024
- Zero statistically significant correlation with LLM citation frequency (515M-event analysis 2026)
- <0.1% of citation-driving bot requests touch /llms.txt (aeoengine.ai 2026 study)
- Used heavily by AI coding agents (Claude Code, Cursor, GitHub Copilot) for documentation navigation
- Format: minimal markdown with H1 site name + H2 sections of canonical URLs
First-hand insight from ThatDeveloperGuy.
ThatDeveloperGuy ships llms.txt across the entire ThatWebHostingGuy substrate (130+ client sites) as of early 2026. The cost is negligible (one file per site, generated from existing sitemap data via our aio-surfaces toolkit). The honest assessment: empirical evidence of direct citation lift is sobering. A 2026 analysis of 515 million LLM bot traffic events found NO statistically significant correlation between llms.txt presence and citation frequency on ChatGPT, Perplexity, or Claude. Less than 0.1 percent of citation-driving bot requests touch /llms.txt.
How TDG approaches it
TDG generates llms.txt + llms-full.txt + aeo.json + entity.json + brand.json + ai.txt from a single typed site config using our open-source aio-surfaces toolkit (PyPI, MIT licensed). Files are regenerated on every site build and committed to the deployment pipeline. We treat it as future-proofing — the agentic web layer is growing fast even if direct LLM citation impact is currently zero.
Common mistakes.
- Treating llms.txt as a direct ChatGPT citation lever (515M-event study confirms it isn't)
- Listing every page on the site — defeats the purpose (curation)
- Failing to update when canonical pages change (must stay accurate)
- Skipping llms-full.txt for sites with substantial content (extended format helps agentic AI)
- Hiding llms.txt behind authentication or 404 (must be publicly fetchable at site root)
FAQ.
Does llms.txt help me appear in ChatGPT search?
No, not directly as of May 2026. The 515 million bot-event analysis found no statistically significant citation correlation. Implement it for future-proofing in the agentic web layer, not current ChatGPT visibility.
What's the difference between llms.txt and llms-full.txt?
llms.txt is the curated index — one file listing your canonical URLs with brief descriptions. llms-full.txt is the extended corpus — full content for AI ingestion. Most sites only need llms.txt. Sites with substantial documentation benefit from both.
Where do I put the file?
At the root of your site: https://yoursite.com/llms.txt. Same convention as robots.txt and sitemap.xml. Must be publicly fetchable, not behind authentication.
Do I need to update it when content changes?
Yes. The whole point is that it represents your canonical current content. Regenerate whenever you publish or remove significant pages. Ideally automate via your build pipeline.
Should I list every page in llms.txt?
No. Curate. List your canonical, evergreen, high-quality pages that AI engines should treat as authoritative. Burying signal in noise defeats the purpose. Our typical llms.txt for a 100-page site lists 15-30 canonical URLs.
Maintained by Joseph W. Anady at ThatDeveloperGuy. Back to glossary · Suggest a term