Your Google Search Console looks healthy. Traffic is steady. Googlebot is crawling everything just fine.
And yet your content doesn't exist for ChatGPT, Perplexity, or Google AI Overviews.
Here's why: Googlebot is the exception, not the rule. Google spends significant infrastructure to render JavaScript before indexing. AI crawlers — GPTBot, PerplexityBot, ClaudeBot, anthropic-ai — do not. They send an HTTP request, receive whatever HTML is in the initial response, and move on. If that HTML is <div id="root"></div>, they index nothing. Your blog posts, product pages, and landing pages vanish entirely from AI citation systems.
Quick summary: If you're running a React SPA, a Next.js app with heavy client-side rendering, or anything built with Vite or Create React App without server rendering, AI systems have likely never seen a word of your content — even if Google ranks you fine.
Why AI Crawlers Can't Read Your JavaScript
The crawl gap nobody talks about
Most web developers learned SEO in a Googlebot world. Googlebot has, since 2015, been capable of rendering JavaScript — meaning a React app that builds its content client-side could still get indexed by Google. That shaped a generation of development decisions: SPAs were fine for SEO, Next.js client components were fine, CSR was fine, because Google would figure it out.
AI crawlers are not Googlebot. They are plain HTTP crawlers. When GPTBot or PerplexityBot requests your page, they get the raw HTML your server sends before any JavaScript executes. For a React SPA, that raw HTML looks something like this:
<!DOCTYPE html>
<html lang="en">
<head>
<title>My Company - Great Products</title>
<meta name="description" content="We make great things." />
</head>
<body>
<div id="root"></div>
<script src="/static/js/main.abc123.js"></script>
</body>
</html>
That <div id="root"></div> is all the crawler sees. The crawler logs a page visit. The LLM ingests an empty page. Your content is never trained on, never cited, never surfaced.
Which crawlers are affected
Every major AI crawler operates this way:
- GPTBot — OpenAI's training and retrieval crawler (used for ChatGPT)
- OAI-SearchBot — OpenAI's search-specific crawler
- PerplexityBot — Perplexity AI's crawler
- ClaudeBot and anthropic-ai — Anthropic's crawlers
- Google-Extended — Google's AI Overviews-specific crawler (also does not render JS)
The common thread: none of them execute JavaScript. They are HTTP request crawlers, full stop.
Why Googlebot's capabilities mislead you
Google has invested heavily in JavaScript rendering because it has to — the modern web is largely JS-driven, and refusing to render JS would make Google's index increasingly incomplete. That investment doesn't extend to AI companies. Running headless browsers at scale to render every page before ingestion is computationally expensive, and it isn't something GPT training pipelines were built to do.
The result is a crawl gap: your content may be perfectly indexed by Google while being completely absent from every AI training corpus and citation pool.
How to Check If Your Site Has This Problem
The fastest test requires nothing but a terminal:
curl -A "GPTBot" https://yoursite.com/blog/your-post | grep "your headline"
If that command returns your headline text, your page is server-rendered and AI crawlers can read it. If it returns nothing, you have a problem.
Run this on your five most important pages: your homepage, your highest-traffic blog post, your main product or service page, your pricing page, and any page you most want cited by AI.
A second quick test: open the page in a browser, open DevTools, disable JavaScript, and reload. What you see is roughly what AI crawlers see. If the page is blank or shows a loading spinner, that content is invisible to AI.
How to Fix It: SSR and SSG by Framework
Next.js App Router (Next.js 13+)
If you're using the App Router, all components are React Server Components by default — they render on the server and produce real HTML in the initial response. The trap is 'use client'.
When you add 'use client' to a component, you're opting it into client-side rendering. That's appropriate for interactive UI (forms, modals, dropdowns). It is not appropriate for content-heavy components: article bodies, product descriptions, pricing tables, FAQ sections.
If your blog post component looks like this, you have a problem:
'use client'
export default function BlogPost({ post }) {
return (
<article>
<h1>{post.title}</h1>
<div dangerouslySetInnerHTML={{ __html: post.content }} />
</article>
)
}
Remove 'use client' unless this component genuinely needs browser APIs or React hooks like useState. Content display components almost never need to be client components.
Next.js Pages Router
In the Pages Router, you must explicitly opt into server rendering. There are two functions to know:
getStaticProps — runs at build time, generates static HTML. Best for content that doesn't change frequently (blog posts, documentation, marketing pages).
export async function getStaticProps({ params }) {
const post = await fetchPost(params.slug)
return {
props: { post },
revalidate: 3600, // ISR: regenerate every hour
}
}
export default function BlogPost({ post }) {
return (
<article>
<h1>{post.title}</h1>
<div dangerouslySetInnerHTML={{ __html: post.content }} />
</article>
)
}
getServerSideProps — runs on every request, generates HTML dynamically. Best for pages that depend on request-time data.
If your Pages Router pages use neither function and instead fetch data inside a useEffect, your content is rendered client-side and invisible to AI crawlers.
React with Vite or Create React App
Vite and CRA do not include server rendering out of the box. Your options, roughly in order of effort:
- Migrate to Next.js. This is the most thorough fix and the right long-term move for most teams.
- Use React Router with SSR. React Router v7 (formerly Remix) supports server rendering and is a reasonable alternative.
- Add pre-rendering. Tools like
react-snapgenerate static HTML snapshots at build time. Services like Prerender.io render pages on request for crawlers.
AI Crawler Visibility Checklist
- Run
curl -A "GPTBot" [url]on your 5 most important pages — confirm content is present in the raw HTML response - Check
robots.txt— confirm it does not block GPTBot, PerplexityBot, ClaudeBot, anthropic-ai, or OAI-SearchBot - Confirm all content-heavy pages use SSR or SSG, not client-side rendering
- Audit
'use client'usage in Next.js App Router — remove it from any component that only displays content - Add JSON-LD structured data (Article, FAQ, Organization) to key pages — see our guide to schema markup for AI
- Create
/llms.txt— a plain-text file at your root describing your site and its key content - Test with JavaScript disabled in DevTools — all critical content should be visible without JS
What This Means for GEO
If you're thinking about Generative Engine Optimization — making your content citable by AI systems — server rendering is the baseline requirement. Everything else (structured data, authority signals, citation-worthy content) is irrelevant if AI crawlers can't read your pages in the first place.
The AI visibility stack, in order of dependency:
- Crawlability — bots must be allowed and able to reach your pages
- Renderability — pages must return content in initial HTML (SSR/SSG)
- Structure — content must be organized with proper headings, schema markup, and semantic HTML
- Authority — content must be trustworthy, cited externally, and meet E-E-A-T signals
You cannot skip step two. If you're running a Shopify store with a JavaScript-heavy theme, see our guide on making your Shopify store visible to ChatGPT.
Frequently Asked Questions
Does Next.js automatically fix the AI crawler problem?
It depends on how you're using it. Next.js App Router server-renders components by default, which means your content should appear in initial HTML — as long as you haven't added 'use client' to content-heavy components. If your pages rely on client-side data fetching with useEffect, that data won't be in the initial HTML. Next.js Pages Router requires you to explicitly use getStaticProps or getServerSideProps. The safest check is always the curl test above.
How do I know if my React site is invisible to AI crawlers?
Run the curl test against your most important pages with the GPTBot user-agent and look for your content in the response. A second method: open DevTools in Chrome, go to Settings > Debugger, check "Disable JavaScript," then reload. What renders without JavaScript is what AI crawlers see.
Does this affect my Google rankings?
For most sites, not significantly — Googlebot renders JavaScript, so Google sees your content regardless of whether it's server-rendered. The primary impact of CSR is on AI visibility, not Google rankings — which is exactly why many teams have the problem without knowing it. Their ranking data looks clean while AI crawlers see nothing.