ChatGPT, Claude, and Perplexity optimization for NYC businesses

One strong answer surface does not automatically carry over to the others. Cross-platform AEO is a shared retrieval problem with platform-specific variation, not a set of isolated hacks.

Each major answer engine appears to combine a retrieval pipeline (which pages it can fetch and rank), a grounding step (which entity it believes the page is about), and a citation policy (when it is willing to name a source by URL or by brand). The technical work below targets each of those layers across ChatGPT, Claude, Perplexity, Gemini, and Copilot.

Everything that follows is based on Canonry's observations and publicly available reporting. None of the major AI vendors publish a definitive specification of how their retrieval, grounding, and citation pipelines work, and the underlying mechanics can change without notice. Treat the platform-specific details as working hypotheses we have tested against client data, not confirmed architectures.

ChatGPT (OpenAI)

Based on our observations, ChatGPT browsing appears to draw on Google and Bing index data alongside OpenAI's own crawler signals, layered with entity grounding. Pages tend to be cited when GPTBot, OAI-SearchBot, and ChatGPT-User can crawl them, the entity is unambiguous across schema and external profiles, and the section that answers the question reads cleanly without surrounding chrome.

Perplexity

Citations are first-class and shown to the user, so source quality, page-level extractability, and a clear claim-to-evidence chain appear to matter more than on most platforms. PerplexityBot and Perplexity-User must be allowed in robots.txt, and dense, declarative passages tend to be favored over marketing prose.

Claude (Anthropic)

Based on our observations, Claude web search appears to draw on Brave and Bing index data alongside Anthropic's own crawler signals, with a conservative citation bias toward sources it can quote verbatim. ClaudeBot, Claude-User, and anthropic-ai need crawl access, and pages with clean structure, internal links, and reputable third-party corroboration tend to get cited more reliably than pages relying on assertion alone.

Gemini (Google)

Gemini appears to draw heavily on the Google index and Knowledge Graph, so traditional SEO signals plus a strong entity record (sameAs links to LinkedIn, Crunchbase, Wikidata, and a verified Google Business Profile where applicable) tend to carry meaningful weight. Google-Extended controls Gemini training access and should be set explicitly rather than left to defaults.

Copilot (Microsoft)

Copilot appears to lean on the Bing index with Microsoft work and enterprise context. Anything that helps Bing rank well, including consistent NAP, fast HTML, and clear structured data, generally seems to help Copilot surface a citation as well.

Most cross-platform lift comes from the same six layers. Get these right before chasing platform-specific tactics.

These signals appear to compound across answer engines because the underlying retrieval and grounding stacks seem to share more than they differ. In our experience, a site that nails the foundation tends to perform across ChatGPT, Claude, Perplexity, Gemini, and Copilot at once.

Crawler access

Explicitly allow GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, and Applebot-Extended in robots.txt. A blanket Disallow inherited from a template is the most common silent cause of disappearing from one platform without an obvious content reason.

Structured data depth

Organization or LocalBusiness, Service, FAQPage, Article, BreadcrumbList, and Person JSON-LD with stable @id values across pages turn isolated markup into a connected graph that retrievers can reason over. FAQPage should only appear on pages whose visible body actually contains the matching question and answer pairs.

Entity consistency

Name, address, phone, and canonical URL must match across the site, schema, sameAs links, and external directories. Inconsistent signals fragment the entity record and reduce the confidence a retriever has at grounding time.

Server-rendered, extractable HTML

Critical answer content should be present in the initial HTML response, not injected client-side. Short declarative opening paragraphs, H2 questions above their answers, and numbered or definitional lists are easier for both retrievers and language models to lift verbatim.

AI-readable files

llms.txt and llms-full.txt at the site root give browsing-capable systems a curated, citation-ready map of the site. They do not replace structured data, but they reduce ambiguity about which pages matter and what each one is about, and they must stay in sync with the visible site to remain trustworthy.

External corroboration

Third-party citations, press, .gov and .edu mentions, GitHub or HuggingFace where relevant, and reputable directory listings raise the prior probability that a model treats the entity as legitimate enough to recommend by name in an answer.

  1. Audit robots.txt and confirm the major AI crawlers are explicitly allowed. A blanket Disallow inherited from a CMS template is the most common silent failure.
  2. Tighten entity signals: a single Organization or LocalBusiness JSON-LD block with a stable @id, sameAs links to canonical external profiles, and matching NAP across pages, footer, contact section, and structured data.
  3. Make each commercial page answer one question in the first 80 words, in plain HTML, with the question stated as an H2 above the answer so retrievers can lift it cleanly.
  4. Add or fix FAQPage schema on pages whose body already contains question-and-answer pairs. Do not duplicate the same FAQPage block across every subpage; the audit engine flags this as a cross-cutting issue.
  5. Generate llms.txt and llms-full.txt from a single source of truth and keep them in sync with the visible site so they do not drift over time.
  6. Track prompt-level visibility across ChatGPT, Claude, Perplexity, Gemini, and Copilot on a fixed cadence so changes are attributable rather than anecdotal.

Citation tracking, not ranking tracking

The unit of measurement is whether the business is cited or named in an answer for a given prompt, not where a page sits in a SERP. Track both citation rate and share of voice against named competitors over time.

Multiple runs per prompt

LLM responses are non-deterministic. Sampling parameters, retrieval recency, and ongoing model updates all introduce variance. Sampling the same prompt across multiple runs and at least two model versions per platform gives a useful signal; a single run is noise.

Crawler and referral logs

Server logs reveal which AI bots actually fetch the site and which pages they prefer. Referral analytics show which platforms send post-answer clicks, which is downstream of citations rather than a substitute for them.

Do ChatGPT, Claude, and Perplexity use the same retrieval system?

No, and our understanding is observational rather than vendor-confirmed. Based on what we have seen, ChatGPT browsing appears to combine Google and Bing index data with OpenAI's own crawler signals. Claude web search appears to draw on Brave and Bing alongside Anthropic's own crawler. Perplexity runs its own retrieval and re-ranking pipeline with citations exposed directly to the user. The base layer of crawlable HTML, clean structured data, and consistent entity signals appears to serve all of them; the specifics of how a page is selected and quoted differ and can change without notice.

Which AI crawlers does a site need to allow in robots.txt?

For coverage across the major answer engines, allow GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, and Applebot-Extended at minimum. Disallowing any of these blocks the corresponding platform from including the site in real-time or training retrieval, which is the most common cause of a site disappearing from one platform without an obvious content reason.

Which search indexes does ChatGPT actually use?

Based on Canonry's observations, ChatGPT browsing appears to draw on Google index data, Bing index data, and OpenAI's own crawler signals together, with entity grounding layered on top. OpenAI has not published a definitive specification, so this should be treated as observed behavior rather than confirmed mechanics. The practical implication: many of the signals that help a page rank well in Bing or Google, plus clean HTML and consistent structured data, also tend to help ChatGPT cite the page.

Which search indexes does Claude actually use?

Based on Canonry's observations, Claude web search appears to draw on Brave index data, Bing index data, and Anthropic's own crawler signals. Anthropic has not published a definitive specification, so this should be treated as observed behavior rather than confirmed mechanics. Pages that present clean structure, reputable third-party corroboration, and verbatim-quotable passages tend to be cited more reliably than pages relying on assertion alone.

What structured data matters most for AI citations?

Organization (or LocalBusiness for place-based businesses) with sameAs pointing to canonical external profiles, Service for each offering, Article for editorial content, BreadcrumbList for navigation context, FAQPage where the matching question and answer pairs exist in the visible body, and Person for named experts. Stable @id values across these blocks turn isolated markup into a connected entity graph that retrievers can reason over.

Does llms.txt actually help with answer-engine visibility?

It is not a guaranteed ranking signal, but it removes ambiguity for browsing-capable systems by providing a curated, citation-ready map of the site. Pair it with a longer llms-full.txt for fuller context. Both must stay in sync with the visible site; drift between the two files and the rendered HTML undermines trust in the source.

Why do the same prompt and the same model return different answers on different runs?

LLM responses are non-deterministic. Sampling parameters, retrieval recency, server-side personalization, and ongoing model updates all introduce variance. Useful measurement averages across multiple runs per prompt and across at least two model versions per platform, then watches trends rather than single answers.

Should NYC businesses optimize separately for each platform?

No. The shared foundation (crawlable HTML, structured data, consistent entity record, external corroboration, AI-readable files) accounts for the majority of cross-platform lift. Platform-specific work, like adjusting source pages for Perplexity dense citation behavior or tuning entity grounding for ChatGPT, is worth doing once the base layer is solid, not before.

Will browsing answer engines see content rendered by JavaScript?

Some can, but with less reliability than server-rendered HTML. The safest approach is to ensure the answer to any question the page is meant to win is present in the initial HTML response, with critical headings, definitions, and structured data delivered without requiring client-side execution.

How do Google AI Overviews relate to AEO work for ChatGPT or Perplexity?

AI Overviews sit on top of Google search results and draw heavily on the Knowledge Graph and ranked pages. The strongest cross-platform sites tend to perform across all answer surfaces because the underlying signals (entity authority, structured data, extractable content) overlap. Optimizing exclusively for one surface tends to leave value on the table.

How long does it take for new content to show up in AI answers?

Browsing-mode citations can appear within days once the page is crawled and indexed. Training-mode references, where a page becomes part of a model parametric knowledge, can take a full model retraining cycle and are not guaranteed. Most measurable change in the first 30 to 90 days comes from the browsing layer, not from new training.

How does Canonry measure visibility across these platforms?

Citation rate and share of voice against named competitors are tracked across ChatGPT, Claude, Perplexity, Gemini, and Copilot on a fixed cadence, with multiple runs per prompt and at least two model versions per platform. AI crawler hits and referral traffic from each platform are joined to that view from server logs to separate retrieval reach from answer-level citation.

What is the single biggest cross-platform lever for an NYC business?

A clean, internally consistent entity record. One canonical Organization or LocalBusiness JSON-LD with a stable @id, sameAs links to verified profiles (LinkedIn, Crunchbase, Wikidata, Google Business Profile where applicable), and matching NAP across the site, schema, footer, and external directories. Most platform-specific issues sit downstream of this layer.

We treat ChatGPT, Claude, Gemini, Copilot, and Perplexity as a shared visibility problem with platform-specific variation, not isolated hacks. The 16-factor on-site model covers the technical layer common to all of them; prompt-level monitoring across platforms surfaces where additional, platform-specific work is worth doing.

Start with the free audit, then expand into prompt-level work.

ChatGPT, Claude & Perplexity Optimization for NYC | Canonry