Do ChatGPT, Claude, and Perplexity use the same retrieval system?
No, and our understanding is observational rather than vendor-confirmed. Based on what we have seen, ChatGPT browsing appears to combine Google and Bing index data with OpenAI's own crawler signals. Claude web search appears to draw on Brave and Bing alongside Anthropic's own crawler. Perplexity runs its own retrieval and re-ranking pipeline with citations exposed directly to the user. The base layer of crawlable HTML, clean structured data, and consistent entity signals appears to serve all of them; the specifics of how a page is selected and quoted differ and can change without notice.
Which AI crawlers does a site need to allow in robots.txt?
For coverage across the major answer engines, allow GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, and Applebot-Extended at minimum. Disallowing any of these blocks the corresponding platform from including the site in real-time or training retrieval, which is the most common cause of a site disappearing from one platform without an obvious content reason.
Which search indexes does ChatGPT actually use?
Based on Canonry's observations, ChatGPT browsing appears to draw on Google index data, Bing index data, and OpenAI's own crawler signals together, with entity grounding layered on top. OpenAI has not published a definitive specification, so this should be treated as observed behavior rather than confirmed mechanics. The practical implication: many of the signals that help a page rank well in Bing or Google, plus clean HTML and consistent structured data, also tend to help ChatGPT cite the page.
Which search indexes does Claude actually use?
Based on Canonry's observations, Claude web search appears to draw on Brave index data, Bing index data, and Anthropic's own crawler signals. Anthropic has not published a definitive specification, so this should be treated as observed behavior rather than confirmed mechanics. Pages that present clean structure, reputable third-party corroboration, and verbatim-quotable passages tend to be cited more reliably than pages relying on assertion alone.
What structured data matters most for AI citations?
Organization (or LocalBusiness for place-based businesses) with sameAs pointing to canonical external profiles, Service for each offering, Article for editorial content, BreadcrumbList for navigation context, FAQPage where the matching question and answer pairs exist in the visible body, and Person for named experts. Stable @id values across these blocks turn isolated markup into a connected entity graph that retrievers can reason over.
Does llms.txt actually help with answer-engine visibility?
It is not a guaranteed ranking signal, but it removes ambiguity for browsing-capable systems by providing a curated, citation-ready map of the site. Pair it with a longer llms-full.txt for fuller context. Both must stay in sync with the visible site; drift between the two files and the rendered HTML undermines trust in the source.
Why do the same prompt and the same model return different answers on different runs?
LLM responses are non-deterministic. Sampling parameters, retrieval recency, server-side personalization, and ongoing model updates all introduce variance. Useful measurement averages across multiple runs per prompt and across at least two model versions per platform, then watches trends rather than single answers.
Should NYC businesses optimize separately for each platform?
No. The shared foundation (crawlable HTML, structured data, consistent entity record, external corroboration, AI-readable files) accounts for the majority of cross-platform lift. Platform-specific work, like adjusting source pages for Perplexity dense citation behavior or tuning entity grounding for ChatGPT, is worth doing once the base layer is solid, not before.
Will browsing answer engines see content rendered by JavaScript?
Some can, but with less reliability than server-rendered HTML. The safest approach is to ensure the answer to any question the page is meant to win is present in the initial HTML response, with critical headings, definitions, and structured data delivered without requiring client-side execution.
How do Google AI Overviews relate to AEO work for ChatGPT or Perplexity?
AI Overviews sit on top of Google search results and draw heavily on the Knowledge Graph and ranked pages. The strongest cross-platform sites tend to perform across all answer surfaces because the underlying signals (entity authority, structured data, extractable content) overlap. Optimizing exclusively for one surface tends to leave value on the table.
How long does it take for new content to show up in AI answers?
Browsing-mode citations can appear within days once the page is crawled and indexed. Training-mode references, where a page becomes part of a model parametric knowledge, can take a full model retraining cycle and are not guaranteed. Most measurable change in the first 30 to 90 days comes from the browsing layer, not from new training.
How does Canonry measure visibility across these platforms?
Citation rate and share of voice against named competitors are tracked across ChatGPT, Claude, Perplexity, Gemini, and Copilot on a fixed cadence, with multiple runs per prompt and at least two model versions per platform. AI crawler hits and referral traffic from each platform are joined to that view from server logs to separate retrieval reach from answer-level citation.
What is the single biggest cross-platform lever for an NYC business?
A clean, internally consistent entity record. One canonical Organization or LocalBusiness JSON-LD with a stable @id, sameAs links to verified profiles (LinkedIn, Crunchbase, Wikidata, Google Business Profile where applicable), and matching NAP across the site, schema, footer, and external directories. Most platform-specific issues sit downstream of this layer.