Your website gets crawled every week by at least six AI engines: Googlebot (AI Overviews), OAI-SearchBot (ChatGPT), GPTBot (OpenAI training), PerplexityBot, ClaudeBot (Anthropic), and Bingbot (Copilot).
Each one is evaluating whether your content is worth citing in AI-generated answers. Most of them leave without citing anything — not because your content is bad, but because you have technical blockers, structural gaps, or trust signals that are missing or wrong.
The Checklist
1. AI Crawlers Are Allowed in robots.txt
Why it matters: If your robots.txt blocks ChatGPT, Perplexity, or other AI crawlers, your content doesn't get indexed — and can't be cited. Full stop.
How to check: Visit yourdomain.com/robots.txt. Look for rules that Disallow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended, or anthropic-ai.
How to fix: Add explicit Allow rules for each bot you want to permit. If you have a blanket Disallow: / for unrecognized bots, whitelist these individually.
2. You Have an llms.txt File
Why it matters: llms.txt is an emerging standard that gives AI models a curated, machine-readable overview of your most important content. Pages listed in llms.txt are indexed and cited more reliably than pages discovered through standard crawling alone.
How to check: Visit yourdomain.com/llms.txt. If you get a 404, you don't have one.
How to fix: Create a markdown-formatted text file at your domain root with sections for your company description, key pages, services, and notable content. List URLs in markdown link format. This takes about 30 minutes and is zero-risk.
3. Organization Schema Is Present and Complete
Why it matters: Schema markup is the technical layer that tells AI engines what your business is, not just what your website says. Organization schema with a complete name, URL, description, and contact information is the minimum viable trust signal for any business site.
How to check: Use Google's Rich Results Test or paste your homepage HTML and search for @type": "Organization".
How to fix: Add JSON-LD Organization schema to your homepage <head>. Include: name, url, description, logo, contactPoint, sameAs (your social profiles), and foundingDate.
4. FAQPage Schema on Key Pages
Why it matters: FAQPage schema pages are 3.2× more likely to appear in Google AI Overviews — the highest citation multiplier of any schema type. AI engines extract FAQ content directly into answers and credit the source. Important note: incomplete or generic schema can actually hurt citation rates (an 18-point penalty vs. no schema). Only implement FAQPage if you fill every field properly.
How to fix: Add an FAQ section to your most important landing pages and service pages. Mark it up with FAQPage + Question + Answer JSON-LD. Keep answers concise (50–150 words) and genuinely helpful — not marketing copy.
5. Content Follows Answer-First Structure
Why it matters: AI models extract answers programmatically. Content that buries the answer in paragraph 4 after 300 words of preamble gets skipped. Content that answers the question in the first 1–3 sentences after an H2 heading gets extracted and cited.
How to fix: Rewrite section headings as questions ("What is GEO?" instead of "About GEO"). Rewrite the opening sentence of each section to directly answer that question before expanding with context.
6. Author and E-E-A-T Signals Are Visible
Why it matters: 96% of Google AI Overview citations come from sources with strong E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) signals. For Perplexity, anonymous content is rarely cited at all.
How to fix: Add author bylines to every piece of content. Create author profile pages with credentials, photo, and links to their professional profiles (LinkedIn, personal site). Mark authors up with Person schema including jobTitle, affiliation, and url.
7. Publish Dates Are Visible and Schema-Confirmed
Why it matters: Freshness is a primary ranking signal for Perplexity and an important one for Google. AI engines de-prioritize content with no visible publish date or with outdated dates.
How to fix: Display publish and update dates prominently, ideally near the article title. Add datePublished and dateModified to your Article JSON-LD. When you update old content, update the dateModified field.
8. Page Speed Under 3 Seconds
Why it matters: Pages that load slowly are crawled less frequently and indexed less completely. The speed-citation relationship is steep: pages with an FCP under 0.4 seconds average 6.7 AI citations per page. Pages with FCP over 1.13 seconds average only 2.1 citations — a 3× difference driven entirely by load time.
How to fix: Common quick wins: compress images, enable caching, defer non-critical JavaScript, use a CDN. Run your URL through Google PageSpeed Insights. Aim for LCP under 2.5 seconds.
9. Sitemap Is Submitted and Current
Why it matters: A current, submitted sitemap ensures AI crawlers can discover all of your important pages. Pages not in the sitemap or not indexed by Google are largely invisible to ChatGPT.
How to fix: Visit yourdomain.com/sitemap.xml. Submit it to Google Search Console. Remove pages you don't want indexed (thin content, duplicates) to concentrate crawl budget on your best pages.
10. Brand Mentions Exist on Third-Party Sites
Why it matters: Brand search volume has the highest correlation with AI citations — higher than backlinks, domain authority, or even content quality. AI engines infer credibility from how often a brand appears in third-party editorial contexts.
How to fix: The highest-leverage activities are: guest articles in industry publications, podcast appearances, being quoted as an expert in news stories, and building genuine presence on Reddit and Hacker News. Focus on earned mentions — paid directories and press releases have near-zero correlation with AI citations.
How Many Did You Pass?
8–10: You're in good shape. Focus on content quality and expanding your brand mention footprint.
5–7: You have meaningful gaps. Start with robots.txt, schema, and answer-first structure — these are the fastest fixes with the highest impact.
0–4: You have critical blockers. AI engines are either unable to crawl your content or unable to trust it. Fix access first, then trust signals, then structure.