When a CMO asks me "how do we get mentioned by ChatGPT?", my first question back is always: "Which ChatGPT do you mean?" That might sound pedantic, but it matters enormously. The answer a large language model gives today draws from at least three distinct information layers — each with different timelines, different update mechanisms, and different optimisation levers. Conflating them is the single most common strategic mistake I see marketing teams make when they start thinking about generative engine optimisation. So let's pull the layers apart, examine what actually happens inside an AI response, and work out where you can realistically intervene.
Most marketers approach GEO the same way they approached SEO in 2010: as one monolithic channel. Publish good content, build links, hope for the best. But generative AI engines don't work like a single index. They combine pre-trained knowledge, real-time retrieval, and structured entity understanding into a blended response. If you optimise for only one layer, you leave two-thirds of the opportunity on the table — and you won't understand why your interventions aren't producing results.
Generative Engine Optimisation (GEO) is the practice of structuring your brand's content and digital presence so that AI language models accurately cite, reference, and recommend you when answering relevant queries. To do GEO well, you need to understand the architecture you're optimising for. And that architecture has three layers.
If you optimise for only one layer of how AI generates answers, you leave two-thirds of the opportunity on the table.
Every large language model starts with a training corpus — a massive collection of text scraped from the web, books, academic papers, and other sources. GPT-4, Claude, Gemini — they all have a knowledge cutoff date. Anything the model learned during training is baked into its weights. It doesn't "look up" this information; it already has it, encoded as statistical patterns across billions of parameters.
Here's the thing: you can't change training data retroactively. If your brand was poorly represented — or entirely absent — in the corpus that trained a model released in early 2025, no amount of content published in March 2025 will fix that specific version's knowledge. Training data is historical. It reflects what existed on the web during the crawl window, which for most frontier models is roughly 6 to 18 months before release.
So what can you actually do about this layer? You play the long game. Content you publish today, if it gets cited on authoritative sources, indexed broadly, and referenced across multiple domains, stands a strong chance of being included in future training runs. According to a 2024 analysis by BrightEdge, 58% of AI-generated answers in informational queries drew on content that was at least 12 months old — suggesting that the training data layer still dominates for many query types.
Worth noting: this layer rewards patience. You're not optimising for next quarter's AI mentions. You're building the foundation that makes your brand a durable part of the model's knowledge. Think of it as the GEO equivalent of domain authority — slow to build, hard to lose.
This is where things get interesting — and where most GEO activity produces its fastest results. Nearly every major AI assistant now uses some form of retrieval-augmented generation (RAG). When you ask Perplexity a question, or use Bing Chat, or trigger a Google AI Overview, the system doesn't just rely on what it was trained on. It performs a live web search, pulls in fresh sources, and synthesises those results into its answer.
The retrieval layer is closer to traditional SEO in many respects. The AI engine sends a query — or a decomposed set of sub-queries — to a search index. It retrieves a set of candidate documents. Then it reads, summarises, and cites from those documents. If your page ranks well for relevant queries and is structured in a way that's easy for an AI to extract information from, you have a meaningful chance of being cited.
But there are important differences from conventional search. AI retrieval systems tend to favour content that provides direct, concise answers with clear attributions. Long, meandering blog posts that bury the answer under 800 words of preamble perform poorly. The AI is looking for extractable claims — sentences that can be lifted and cited with minimal rewriting.
A good example of retrieval-layer optimisation done well is how HubSpot structures its marketing glossary pages. Each page leads with a crisp one-sentence definition, expands into practical detail, and ends with related questions — all of which makes them extremely retrievable by AI engines. You don't need HubSpot's domain authority to apply the same structural principles.
This is the layer most marketers haven't thought about yet, and in my experience it's the one that will matter most over the next two to three years. Entity signals are the structured and semi-structured data points that help AI systems understand what your brand is — not just what your website says, but how the broader web defines and categorises you.
Think of it this way: when someone asks an AI "what's the best project management tool for remote teams?", the model doesn't just retrieve blog posts. It draws on an internal representation of entities — companies, products, categories, relationships. If the model has a strong, well-connected entity representation for your brand that associates it with "project management", "remote teams", and "positive user sentiment", you're more likely to be mentioned.
Entity signals come from multiple sources: your Google Business Profile, your Knowledge Panel, Wikidata entries, Crunchbase, LinkedIn company pages, schema markup on your site, mentions in structured databases, and the consistency of how third-party sources describe you. At Arclign, we've started calling this your entity footprint — the sum total of structured signals that tell AI systems who you are, what you do, and how you relate to other entities in your space.
The brands getting this right aren't just publishing content — they're actively managing their structured identity. Canva, for instance, has an exceptionally clean entity footprint: consistent descriptions across platforms, a well-maintained Wikipedia article, robust schema markup, and thousands of third-party mentions that all use similar language to describe its core product category. That consistency is a signal that AI models can rely on.
Some marketers hear "entity signals" and think it's just brand SEO repackaged. It isn't — though the two are related. Traditional brand SEO focuses on making sure your branded search results look good in Google. Entity optimisation for GEO focuses on making sure AI systems have a coherent, accurate, and well-connected understanding of your brand's identity.
The difference matters because AI models don't just retrieve your homepage and read it. They build internal representations — sometimes called knowledge graphs or entity embeddings — that encode relationships. "Arclign is a consultancy. It specialises in generative engine optimisation. It works with B2B SaaS companies. Its team includes former SEO strategists." Each of those associations is an entity signal, and they come not from a single page but from the pattern of information across many sources.
The three layers aren't independent — they reinforce each other. Content that performs well in retrieval today is more likely to be included in next year's training data. A strong entity footprint makes it easier for retrieval systems to identify your content as authoritative. And training data shapes the model's baseline understanding of entities, which influences how it interprets and weights retrieved content.
This is why a comprehensive GEO strategy can't focus on just one layer. I've seen companies pour resources into producing retrieval-optimised content — perfectly structured, FAQ-rich, AI-friendly — while completely neglecting their entity signals. The result? They get cited occasionally, but never recommended. The AI mentions their content as a source, but doesn't identify the brand as an authority in the space. The entity layer was missing.
Conversely, some brands have strong entity footprints — everyone knows who they are — but their actual content is poorly structured for retrieval. They rely on the training data layer almost entirely, which means they only show up in AI answers where the model has memorised information about them. For any query that triggers live retrieval, they're invisible.
So where should you start? My take: it depends on your brand's current position. Here's a simple framework I use with clients at Arclign.
The temptation is to skip straight to content production. Resist it. Without the entity foundation, you're building on sand.
The brands that will benefit most from the shift to AI-mediated search are those that understand these layers and invest across all three. It's not enough to be "AI-friendly" in some vague sense. You need a specific strategy for each layer, with different tactics, different timelines, and different success metrics.
Training data is your long-term moat. Retrieval is your near-term opportunity. Entity signals are the connective tissue that makes both layers work harder. And the interaction between them is where the real strategic advantage lies — because most of your competitors are still treating GEO as "SEO but for AI", which means they're optimising for one layer at best.
That gap won't last forever. As GEO matures as a discipline, the three-layer model will become common knowledge. The question is whether you'll have built your foundation by then, or whether you'll be playing catch-up.
The three layers of GEO are training data, retrieval, and entity signals. The training data layer refers to information already encoded in an AI model's weights from its pre-training corpus. The retrieval layer involves real-time web searches that AI engines perform to supplement their knowledge with fresh content. The entity signal layer is the structured and semi-structured data that helps AI systems understand what a brand is, what it does, and how it relates to other entities. An effective GEO strategy requires interventions at all three layers.
AI models like GPT-4 and Claude are trained on massive text corpora scraped from the web, books, and academic sources. If your brand was well-represented in those sources during the crawl window — typically 6 to 18 months before a model's release — the model may already 'know' about you and include you in relevant answers. You can't change what's already in a trained model, but you can influence future training runs by publishing substantive content, earning citations on high-authority sites like Wikipedia and industry publications, and maintaining consistent brand descriptions across the web.
Entity signals are the structured and semi-structured data points that help AI systems build an internal representation of your brand — including what you are, what category you belong to, and how you relate to other known entities. These signals come from sources like Google Knowledge Panels, Wikidata, Crunchbase, LinkedIn, schema markup, and the consistency of third-party descriptions. Entity signals matter because they influence whether AI models recommend your brand as an authority in a given space, not just cite a single piece of your content. Brands with strong, consistent entity footprints are significantly more likely to appear in AI-generated recommendations.
To optimise content for AI retrieval, structure it so that key claims and definitions appear early in each section rather than being buried in long preambles. Use clear H2 and H3 headings that mirror natural question phrasing. Include specific data points, named entities, and attributable statistics — AI systems cite specific claims more readily than vague generalisations. Adding FAQ sections with standalone answers is particularly effective, because each question-answer pair maps directly to a potential AI query. According to Semrush's 2025 research, content with structured claims is 3.2 times more likely to be cited in AI-generated responses.
GEO (Generative Engine Optimisation) and traditional SEO share some foundations — both benefit from authoritative content, strong backlinks, and good site structure — but they differ in important ways. SEO focuses on ranking pages in a list of search results, while GEO focuses on being cited, referenced, or recommended within AI-generated answers. GEO requires attention to three distinct layers: training data, real-time retrieval, and entity signals. It also demands a different content structure, emphasising extractable claims, clear definitions, and machine-readable identity data. The two disciplines are complementary, not competing; strong SEO performance tends to improve GEO visibility, particularly in the retrieval layer.
I opened by asking "which ChatGPT do you mean?" — and now you can see why the question matters. The answer a user gets depends on what the model was trained on, what it retrieves in real time, and how well it understands your brand as an entity. Three layers, three sets of levers, three different timelines for results. If there's one thing I'd want you to take away, it's this: stop treating AI visibility as a single problem with a single solution. Map your current presence across all three layers, identify the gaps, and build a strategy that addresses each one. The companies doing this now — systematically and patiently — are the ones that will own their categories in AI search by 2027.
Get a free GEO audit showing exactly how ChatGPT and Perplexity describe your brand today — and what it'll take to reach the top.
Book a Free Audit