robots.txt generator: build robot directives with 2026 AI bot rules.
robots.txt is the original web protocol for telling crawlers what they may and may not access. In 2026 it has new urgency because of the 12 plus major AI bots now crawling the web for LLM training and retrieval-augmented generation. Sites that want to be cited by ChatGPT must allow GPTBot. Sites that want to opt out of LLM training must explicitly block it.
About this tool.
robots.txt has been the standard for telling crawlers what to access since 1994. In 2026 it has new significance because of the explosion of AI-training and AI-search bots that respect (or claim to respect) the protocol. The ability to be cited by ChatGPT, Perplexity, Claude, and Gemini now depends on allowing the right bot user-agents.
The 2026 canonical AI bot list now includes at minimum: GPTBot and ChatGPT-User from OpenAI, ClaudeBot from Anthropic, PerplexityBot, Google-Extended (controls Gemini training without affecting Search), CCBot (Common Crawl, which feeds LLM training corpora), Applebot-Extended (Apple Intelligence), Bytespider (ByteDance / TikTok), and Amazonbot. Each must be explicitly addressed if you want either training inclusion or training exclusion.
Most small business sites should allow all of them. The risk of being trained on is generally lower than the risk of being absent from LLM responses when users ask the AI for vendor recommendations in your industry. ThatDeveloperGuy explicitly allows every major AI bot across all 130 plus client sites in our hosting substrate.
There are real exceptions. Health-record portals, financial transaction pages, customer admin areas, and copyrighted content libraries should be blocked. The generator above lets you specify paths to disallow universally while still allowing AI bots on your public content.
Sitemap declaration in robots.txt is still useful in 2026. Google reads it as a hint, Bing and other engines treat it more authoritatively. Always include your sitemap.xml URL at the bottom of your robots.txt for redundancy with your GSC submission.
ThatDeveloperGuy's own robots.txt at thatdeveloperguy.com/robots.txt is updated quarterly to reflect new AI bot user-agents as they emerge. We do not block any major AI crawler. Net effect on Google rankings: zero. Net effect on ChatGPT and Perplexity citations: measurable lift in our 2026 internal testing.
FAQ.
Should I block GPTBot to prevent ChatGPT from training on my content?
Probably not, unless you have specific copyright concerns. The training corpus is mostly already collected. Blocking now means you lose visibility in ChatGPT Search (the live retrieval system) without preventing past training. ThatDeveloperGuy recommends allowing GPTBot for most sites.
What's the difference between GPTBot and ChatGPT-User?
GPTBot is OpenAI's training crawler — content it fetches feeds future model training. ChatGPT-User fetches URLs on demand when users ask ChatGPT to browse a specific URL during a conversation. Both should be allowed for maximum visibility.
Does blocking Google-Extended affect my Google Search rankings?
No. Google-Extended controls only whether your content can be used to train Gemini and Vertex AI. Googlebot (regular Google Search crawler) is separate and is not affected by Google-Extended.
How often should I update my robots.txt?
Quarterly minimum. New AI bots emerge regularly (the list grew from 4 in 2023 to 12 plus in 2026). Add new user-agents as they are announced.
Can I block AI bots while still ranking in Google?
Yes. Blocking GPTBot, ClaudeBot, PerplexityBot, etc. has no effect on Googlebot or Bingbot. Your Google ranking is unaffected — but you will not appear in ChatGPT, Claude, or Perplexity responses.
Is robots.txt actually enforced?
It's an honor-system protocol. All major AI companies (OpenAI, Anthropic, Google, Perplexity, ByteDance) publicly commit to respecting it. Smaller scrapers may ignore it. For hard enforcement, use server-side rate limiting + bot detection (Cloudflare Bot Management, fastly).
Built by Joseph W. Anady at ThatDeveloperGuy. Need professional help? Get a free 48-hour audit.