Platform Coverage

ChatGPT, Claude, and Perplexity optimization for NYC businesses

One strong answer surface does not automatically carry over to the others. Cross-platform AEO is a shared retrieval problem with platform-specific variation, not a set of isolated hacks.

Each major answer engine appears to combine a retrieval pipeline (which pages it can fetch and rank), a grounding step (which entity it believes the page is about), and a citation policy (when it is willing to name a source by URL or by brand). The technical work below targets each of those layers across ChatGPT, Claude, Perplexity, Gemini, and Copilot.

Honest Caveat

Everything that follows is based on Canonry's observations and publicly available reporting. None of the major AI vendors publish a definitive specification of how their retrieval, grounding, and citation pipelines work, and the underlying mechanics can change without notice. Treat the platform-specific details as working hypotheses we have tested against client data, not confirmed architectures.

Documented Crawler Controls

ChatGPT (OpenAI)

OpenAI documents separate publisher controls rather than a full retrieval recipe. OAI-SearchBot governs discovery for ChatGPT search summaries and snippets; GPTBot governs potential model training. Keep pages crawlable and clearly structured, but treat access as eligibility, not a citation promise.

Perplexity

PerplexityBot is the crawler Perplexity uses to surface and link sites in search results, not to train foundation models. Perplexity-User handles a user-requested page visit and generally ignores robots.txt. Clear claims and supporting evidence help a page stand on its own, but neither control guarantees a citation.

Claude (Anthropic)

Anthropic documents three separate bots: Claude-SearchBot for search quality, Claude-User for user-requested retrieval, and ClaudeBot for potential model training. Anthropic does not publish a complete citation or ranking formula, so the durable work is clear structure, accurate claims, and corroborating sources.

Gemini (Google)

Googlebot controls Google Search crawling and indexing. Google-Extended is a separate product token for how Google may use content in Gemini Apps and Vertex AI grounding or model improvement; it does not affect Google Search indexing or ranking. Strong search fundamentals and an unambiguous entity record remain useful, without implying a Gemini citation guarantee.

Copilot (Microsoft)

Microsoft does not publish a complete Copilot retrieval or citation specification. Keep pages accessible to Bing and make the entity, service, and supporting evidence clear, but validate visibility with repeated real prompts rather than treating any crawler setting as a result.

Shared Technical Signals

Most cross-platform lift comes from the same six layers. Get these right before chasing platform-specific tactics.

These signals appear to compound across answer engines because the underlying retrieval and grounding stacks seem to share more than they differ. In our experience, a site that nails the foundation tends to perform across ChatGPT, Claude, Perplexity, Gemini, and Copilot at once.

Crawler access

Set controls by purpose, not from a single universal allowlist. Search and indexing bots, user-requested fetchers, and model-training or grounding controls have different effects. Check each provider's current documentation, then confirm your robots.txt does not accidentally block the access you intend to allow.

Structured data depth

Organization or LocalBusiness, Service, FAQPage, Article, BreadcrumbList, and Person JSON-LD with stable @id values across pages turn isolated markup into a connected graph that retrievers can reason over. FAQPage should only appear on pages whose visible body actually contains the matching question and answer pairs.

Entity consistency

Name, address, phone, and canonical URL must match across the site, schema, sameAs links, and external directories. Inconsistent signals fragment the entity record and reduce the confidence a retriever has at grounding time.

Server-rendered, extractable HTML

Critical answer content should be present in the initial HTML response, not injected client-side. Short declarative opening paragraphs, H2 questions above their answers, and numbered or definitional lists are easier for both retrievers and language models to lift verbatim.

AI-readable files

llms.txt and llms-full.txt at the site root give browsing-capable systems a curated, citation-ready map of the site. They do not replace structured data, but they reduce ambiguity about which pages matter and what each one is about, and they must stay in sync with the visible site to remain trustworthy.

External corroboration

Third-party citations, press, .gov and .edu mentions, GitHub or HuggingFace where relevant, and reputable directory listings raise the prior probability that a model treats the entity as legitimate enough to recommend by name in an answer.

What To Fix First

Audit robots.txt by purpose: search/indexing, user-requested retrieval, and model training or grounding. A blanket Disallow can block intended access, but an Allow does not promise inclusion.
Tighten entity signals: a single Organization or LocalBusiness JSON-LD block with a stable @id, sameAs links to canonical external profiles, and matching NAP across pages, footer, contact section, and structured data.
Make each commercial page answer one question in the first 80 words, in plain HTML, with the question stated as an H2 above the answer so retrievers can lift it cleanly.
Add or fix FAQPage schema on pages whose body already contains question-and-answer pairs. Do not duplicate the same FAQPage block across every subpage; the audit engine flags this as a cross-cutting issue.
Generate llms.txt and llms-full.txt from a single source of truth and keep them in sync with the visible site so they do not drift over time.
Track prompt-level visibility across ChatGPT, Claude, Perplexity, Gemini, and Copilot on a fixed cadence so changes are attributable rather than anecdotal.

How To Measure Cross-Platform Visibility

Citation tracking, not ranking tracking

The unit of measurement is whether the business is cited or named in an answer for a given prompt, not where a page sits in a SERP. Track both citation rate and share of voice against named competitors over time.

Multiple runs per prompt

LLM responses are non-deterministic. Sampling parameters, retrieval recency, and ongoing model updates all introduce variance. Sampling the same prompt across multiple runs and at least two model versions per platform gives a useful signal; a single run is noise.

Crawler and referral logs

Server logs reveal which AI bots actually fetch the site and which pages they prefer. Referral analytics show which platforms send post-answer clicks, which is downstream of citations rather than a substitute for them.

Primary Sources

Frequently Asked Questions

Do ChatGPT, Claude, and Perplexity use the same retrieval system?

No. The vendors publish different crawler controls and do not publish a shared retrieval or citation formula. The common foundation is public, accessible HTML; accurate structured data; a consistent entity; and claims that can be supported. How any platform selects or quotes a page can change without notice.

Which AI crawlers does a site need to allow in robots.txt?

There is no universal list. Decide separately whether to permit search/indexing, user-requested retrieval, and training or grounding. For example, OpenAI documents OAI-SearchBot for ChatGPT search and GPTBot for potential training; Anthropic documents Claude-SearchBot, Claude-User, and ClaudeBot for distinct uses; Perplexity documents PerplexityBot for search and Perplexity-User for user-requested visits; Google-Extended governs Gemini and Vertex AI use, not Google Search. Allowing a control permits that use; it does not guarantee a citation.

Which search indexes does ChatGPT actually use?

OpenAI does not publish a fixed list of search indexes or a complete retrieval specification. Its publisher guidance says OAI-SearchBot enables discovery for ChatGPT search summaries and snippets. Keep the page accessible to that bot and follow normal search-quality practices, but do not present any one index as the documented route to a ChatGPT citation.

Which search indexes does Claude actually use?

Anthropic does not publish a fixed list of search indexes or a complete citation formula. It does document Claude-SearchBot for search quality, Claude-User for user-requested retrieval, and ClaudeBot for potential training. Make content accessible according to the uses you intend to permit and assess outcomes through repeated real searches.

What structured data matters most for AI citations?

Organization (or LocalBusiness for place-based businesses) with sameAs pointing to canonical external profiles, Service for each offering, Article for editorial content, BreadcrumbList for navigation context, FAQPage where the matching question and answer pairs exist in the visible body, and Person for named experts. Stable @id values across these blocks turn isolated markup into a connected entity graph that retrievers can reason over.

Does llms.txt actually help with answer-engine visibility?

It is not a guaranteed ranking signal, but it removes ambiguity for browsing-capable systems by providing a curated, citation-ready map of the site. Pair it with a longer llms-full.txt for fuller context. Both must stay in sync with the visible site; drift between the two files and the rendered HTML undermines trust in the source.

Why do the same prompt and the same model return different answers on different runs?

LLM responses are non-deterministic. Sampling parameters, retrieval recency, server-side personalization, and ongoing model updates all introduce variance. Useful measurement averages across multiple runs per prompt and across at least two model versions per platform, then watches trends rather than single answers.

Should NYC businesses optimize separately for each platform?

No. The shared foundation (crawlable HTML, structured data, consistent entity record, external corroboration, AI-readable files) accounts for the majority of cross-platform lift. Platform-specific work, like adjusting source pages for Perplexity dense citation behavior or tuning entity grounding for ChatGPT, is worth doing once the base layer is solid, not before.

Will browsing answer engines see content rendered by JavaScript?

Some can, but with less reliability than server-rendered HTML. The safest approach is to ensure the answer to any question the page is meant to win is present in the initial HTML response, with critical headings, definitions, and structured data delivered without requiring client-side execution.

How do Google AI Overviews relate to AEO work for ChatGPT or Perplexity?

AI Overviews sit on top of Google search results and draw heavily on the Knowledge Graph and ranked pages. The strongest cross-platform sites tend to perform across all answer surfaces because the underlying signals (entity authority, structured data, extractable content) overlap. Optimizing exclusively for one surface tends to leave value on the table.

How long does it take for new content to show up in AI answers?

Browsing-mode citations can appear within days once the page is crawled and indexed. Training-mode references, where a page becomes part of a model parametric knowledge, can take a full model retraining cycle and are not guaranteed. Most measurable change in the first 30 to 90 days comes from the browsing layer, not from new training.

How does Canonry measure visibility across these platforms?

Citation rate and share of voice against named competitors are tracked across ChatGPT, Claude, Perplexity, Gemini, and Copilot on a fixed cadence, with multiple runs per prompt and at least two model versions per platform. AI crawler hits and referral traffic from each platform are joined to that view from server logs to separate retrieval reach from answer-level citation.

What is the single biggest cross-platform lever for an NYC business?

A clean, internally consistent entity record. One canonical Organization or LocalBusiness JSON-LD with a stable @id, sameAs links to verified profiles (LinkedIn, Crunchbase, Wikidata, Google Business Profile where applicable), and matching NAP across the site, schema, footer, and external directories. Most platform-specific issues sit downstream of this layer.

How Canonry Approaches It

We treat ChatGPT, Claude, Gemini, Copilot, and Perplexity as a shared visibility problem with platform-specific variation, not isolated hacks. The 16-factor on-site model covers the technical layer common to all of them; prompt-level monitoring across platforms surfaces where additional, platform-specific work is worth doing.

Related Resources

16-factor technical on-site AEO methodology covers the on-site layer that compounds across every platform above.
AEO vs SEO for NYC businesses covers what stays the same and what changes for AI-generated answers.
How to choose an NYC based AEO agency covers what to evaluate and which red flags to avoid when picking a partner.
ChatGPT real estate AEO case study walks through an anonymized client that moved from zero ChatGPT visibility to top citation in roughly 4 weeks.
The free onsite technical audit applies the 16-factor scoring model used across all of the above.

Start with the free audit, then expand into prompt-level work.

Run free onsite technical audit Explore AI search visibility