IONIAN Blog — Technical SEO
Are You Accidentally Invisible to AI Search? A 15-Minute Crawler Audit
A default CDN or robots.txt setting can quietly block the crawlers that get you cited in ChatGPT, Perplexity, and Google AI, without blocking the ones that train on you. Here is the 15-minute audit to check.
Are You Accidentally Invisible to AI Search? A 15-Minute Crawler Audit
Here is a failure mode almost no one checks for: your site is great, your content is strong, and ChatGPT, Perplexity, and Google's AI answers still never mention you — because a default setting somewhere quietly told the AI crawlers to go away. In 2026 this is common, fixable, and costing brands citations they don't even know they're missing.
Why this is suddenly a problem
The open scraping free-for-all is closing. Cloudflare, which sits in front of a large share of the web, now blocks AI crawlers by default for new sites and offers a pay-per-crawl model that lets publishers allow, charge, or block each bot. Millions of sites now disallow AI training, and a meaningful percentage block the major AI user-agents outright.
Most of that is deliberate and reasonable. The problem is collateral damage: a default CDN configuration or an overly broad robots.txt rule can block the crawlers that cite you right alongside the ones that train on you — and those are not the same bots. If you want to show up in AI answers, you have to let the right crawlers in on purpose.
Training crawlers vs. answer crawlers — know the difference
This is the distinction that fixes most of the problem:
- Training crawlers collect data to train future models (for example,
GPTBot,Google-Extended,anthropic-ai). Blocking these keeps your content out of training sets. That's a values/IP decision, and blocking them does not remove you from live answers. - Answer / search crawlers fetch pages in real time to build a cited response right now (for example,
OAI-SearchBotandChatGPT-User,PerplexityBot,Claude-SearchBotandClaudeBot, plusGooglebotfeeding AI Overviews). These are the ones that get you cited. Block them and you vanish from the answer.
You can absolutely opt out of training while staying fully visible in AI search. Many teams accidentally do the opposite.
The 15-minute audit
1. Read your own robots.txt. Open yoursite.com/robots.txt. Look for Disallow: / under any AI user-agent, and look for a blanket block that catches the answer crawlers. Make sure the search/answer bots above are allowed to reach your real content.
2. Check your CDN's bot settings. If you're on Cloudflare or similar, find the AI-bot / "block AI scrapers" toggle and confirm it isn't silently blocking the answer crawlers you want. A default-on block is the single most common culprit.
3. Confirm your content renders without JavaScript. Many AI fetchers read the initial HTML and don't run heavy client-side rendering. If your key content only appears after a JavaScript bundle executes, the crawler may see an empty shell. Server-render or statically generate the important pages. (This is why we ship static HTML for crawlers on every build.)
4. Make sure missing pages return a real 404. If every wrong URL returns a 200 with your homepage, crawlers and monitoring tools can't tell what's real — and broken assets hide in plain sight.
5. Point to a real sitemap and keep it current. The answer crawlers and traditional search both use it to find your best pages.
A sane starting allow-list
A reasonable default for a business that wants AI-search visibility: allow the answer/search crawlers, decide deliberately about training crawlers, and keep private routes blocked for everyone. That looks like allowing OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot, ClaudeBot, and Googlebot to your public content, while choosing whether to allow or disallow the training bots (GPTBot, Google-Extended, anthropic-ai) based on how you feel about training use. One caution: some user-triggered fetchers don't strictly honor robots.txt, so anything truly private belongs behind authentication, not just a disallow line.
The upside
This is one of the rare SEO fixes that is fast, free, and high-leverage. Fifteen minutes of checking can be the difference between being a source AI answers quote and being a brand they've never heard of. If you'd rather have someone audit the whole crawl-and-citation path — robots, CDN, rendering, schema, and the page structure that actually earns the citation — that's the core of our technical SEO and GEO work. Pair this with our 4-engine AI-search guide, and reach out if you want the audit done for you.