Search Engines and Schema Markup: Traditional search platforms like Google and Bing have long encouraged structured data markup (e.g. Schema.org JSON-LD for products, reviews, FAQs) to help parse and present website content. This remains true in the era of generative AI search. Google’s new Search Generative Experience (SGE) and upcoming Gemini AI model continue to leverage structured data and knowledge graphs when formulating answers. In fact, Google has rolled out updates in 2024 to improve how product markup is ingested, signaling heightened interest in accurate e-commerce data.
Microsoft’s Bing is on a similar path – Fabrice Canel (Principal PM at Bing) confirmed in early 2025 that schema markup on websites “helps Microsoft’s LLMs understand your content.” In practice, this means that well-structured Product, Review, LocalBusiness, and other schema data on a site can feed into how Bing’s AI-powered chat or Copilot summarizes information. Bing’s AI also values fresh data; Canel advised using indexing APIs like IndexNow to push updates so generative models have up-to-date references. Importantly, while Google hasn’t explicitly confirmed (as of 2025) how Gemini uses schema, experts strongly suspect it does, and Google’s documentation still urges site owners to include structured data as “explicit clues” to help “Google understand the content”. In short, existing web standards (schema.org markup, sitemaps, Merchant Center product feeds) remain a primary way businesses can influence what AI-driven search results “know” about their products. These formats are machine-readable and already consumed by the AI’s underlying search index/knowledge graph, even if not always directly by the language model itself.
Structured Content in AI Answers: When generative AI composes an answer about a product or business, it often draws on the structured information gathered by search crawlers. For example, Google SGE uses its Shopping Graph (built from merchant feeds and schema markup) and Knowledge Graph as key databases alongside the LLM. Bing’s AI chat similarly can present product information (price, availability, etc.) drawn from Bing’s indexed data. Notably, the AI answers presented in search often cite sources or link out to the websites – meaning your structured content and messaging must exist on your site in a way that the search/AI can parse and attribute. Businesses should therefore continue implementing rich structured data (for products, ratings, etc.) so that AI overviews can relay accurate facts. In practice, this is an indirect control mechanism – you can’t dictate the exact phrasing the LLM uses, but by providing well-defined data you increase the odds that the AI will present correct, on-brand information (e.g. the right price, model, features) in its responses.
While schema markup was designed for search engines, new LLM-focused standards are being proposed to give businesses more direct control over content consumed by AI models at inference time. A notable example is “llms.txt” – a proposal introduced by Jeremy Howard in Sept 2024 to provide a concise, AI-readable guide to a website. Placed at the root of a site (e.g. yourdomain.com/llms.txt
), this file is meant to serve as a roadmap for language models, offering a high-level overview, key facts, and direct links to important content in plain Markdown format. The reasoning is that large models struggle with crawling full HTML pages (which contain navigation, scripts, ads, and other noise) within their context window. By curating important information into a structured Markdown file (and optionally providing clean markdown versions of full pages), site owners can essentially hand AI a distilled version of their messaging. Think of it as analogous to robots.txt
, but instead of telling bots what not to crawl, llms.txt
tells AI what content is most important and provides it in an easy-to-digest format.
Adoption and Use Cases: The llms.txt concept is very new but is gaining traction in developer and tech documentation circles. Documentation platforms like Mintlify auto-generate llms.txt files, and companies such as Anthropic (makers of Claude) and Cursor have adopted the standard for their docs via these tools. This early adoption underscores an industry recognition that making content “AI-ready” has value. In principle, llms.txt can be used for any website – proposed use cases range from software docs to legislation to personal portfolios. Notably, e-commerce sites are explicitly envisioned as beneficiaries: “E-commerce: offers product descriptions, pricing, and terms of service in a structured manner, improving AI-driven shopping assistants.” In other words, a retailer could maintain an up-to-date llms.txt summarizing its product lines, brand story, and customer service policies. An AI assistant (whether a search engine’s AI or a third-party chatbot) could fetch this file and quickly learn the key points about the business – presumably leading to more consistent, brand-aligned answers when customers ask questions. It’s important to note that llms.txt is not an official W3C standard and currently adoption is voluntary. It’s a grassroots solution, but it has momentum: there’s a GitHub spec and community discussing best practices, and tools like Firecrawl offer APIs to retrieve llms.txt content or converted markdown from websites for AI consumptionfirecrawl.devfirecrawl.dev. We are still in early days – major AI platforms haven’t (yet) announced native support for llms.txt – but the concept aligns with a broader movement to make the open web more LLM-friendly. Businesses keen on this can experiment with publishing an llms.txt file; at worst it doesn’t harm anything (it’s ignored by legacy crawlers), and at best it could start surfacing in AI crawl behavior or be leveraged by user-agent tools that wrap LLMs.
Beyond web crawling, another way businesses can inject structured content into AI systems is via direct integration (APIs and plugins). OpenAI’s ChatGPT, for instance, introduced a plugin mechanism in 2023 that essentially lets third parties define RESTful APIs for the model to call during a conversationopenai.comopenai.com. A plugin is described by a machine-readable manifest and OpenAPI spec, which together tell the language model what endpoints are available and how to use themopenai.comopenai.com. In practice, this means a company can expose an endpoint – for example, an e-commerce Product Catalog API or a Store FAQ endpoint – and ChatGPT (when the plugin is enabled by a user) can deterministically query that for factual information instead of relying on its own training data. For example, plugins for services like Expedia, Instacart, OpenTable, and Shopify were early demonstrations, allowing ChatGPT to fetch up-to-date product listings or reservations from those businesses’ databasesopenai.com. This gives fine-grained control to the business: the response from the API is authoritative and exactly what the model will relay (aside from any natural language wrapping). Essentially, the plugin framework is an emerging standardized API contract for LLMs – one that OpenAI defined and which Microsoft’s Bing Chat has also embraced (Bing is working with OpenAI to support the same third-party plugins)windowscentral.comwindowscentral.com. While not a web standard per se, it’s a pragmatic protocol that any business can use today by building a JSON-based API and manifest. This approach is especially relevant to e-commerce, where a company might want an AI to pull live product availability, current prices, or personalized recommendations via an API call rather than from stale web data.
Example – ChatGPT Shopping: OpenAI’s recent move into e-commerce search is a case in point. ChatGPT can now return product recommendations with images, prices, and “buy” buttons, then link users out to the retailer’s site to purchasewired.comwired.com. (See image below.) This shopping feature isn’t powered by a single “product database API” that all retailers feed into, but it demonstrates the model’s ability to present structured product info. ChatGPT’s system pulls in data such as product names, descriptions, reviews, and pricing from across the web, likely via a combination of Bing search results and licensed data feedswired.comwired.com. The screenshot below shows ChatGPT answering a query for the “best espresso machine under $200” with a carousel of machines, each with price and source – and a sidebar listing multiple retailers’ offers for a selected product. These results are organic (not ads) and based on ChatGPT’s analysis of online reviews and listingswired.comwired.com. From a business perspective, the control here comes down to ensuring your product information and reviews are accessible on the web (with accurate schema markup, up-to-date pricing on your site, etc.) because ChatGPT is essentially aggregating existing content. However, one could imagine future iterations where merchants might supply a feed or API directly to ChatGPT to ensure accuracy. (OpenAI hasn’t announced a public feed program yet, but given they have struck content licensing deals – e.g. with the publisher of Wiredwired.com – the ecosystem is moving in that direction.)
Example of ChatGPT’s shopping interface recommending products (espresso machines) with structured details. Here, the AI presents product cards with images, titles, and prices, then shows a sidebar with “Purchasing options” from various retailers. This demonstrates how large language models can consume and display e-commerce data. In this case, ChatGPT pulled product info and reviews from across the webwired.comwired.com, synthesizing it into a shopper-friendly format. Businesses cannot directly dictate these results today – they are derived from the AI’s web crawling and analysis – but the trend indicates a growing overlap between SEO, product feeds, and AI-driven shopping assistants.
Anthropic Claude and Others: Other AI providers are likewise exploring direct data integration. Anthropic’s Claude, for example, supports a “tools” API that developers can use to give Claude access to external functions or data sourcesdocs.anthropic.com. In May 2025 Anthropic also introduced a built-in web browsing/search feature for Claude, so it can fetch live information with citationsanthropic.comanthropic.com. While Anthropic hasn’t announced a plugin ecosystem like OpenAI’s, a business could use Claude’s API to build a custom agent that queries the business’s own databases. Many other LLM-powered services (Bard, Perplexity, IBM Watson Assistant, etc.) allow connecting to REST APIs or knowledge bases in custom implementations, even if not via a universal standard.
In summary, API frameworks are emerging as a way to feed structured content directly to LLMs on a permissioned basis. Right now, this often requires the end-user to activate a plugin or the developer to wire up the connection (i.e. it’s opt-in). But it’s a powerful mechanism for deterministic data retrieval – e.g. guaranteeing that an AI assistant uses your latest inventory or policy information verbatim from your system, rather than a web-scraped guess.
Another piece of the puzzle is enabling AIs to query business content via vector search or databases. This is commonly seen in enterprise settings and specialized chatbots: the company’s documents or product catalog is embedded in a vector database, and the LLM is used in a retrieval-augmented generation (RAG) loop to answer questions based on those embeddings. Major AI platforms are supporting this pattern. OpenAI open-sourced a Retrieval Plugin for ChatGPT that any developer can self-host – it indexes a document corpus into a vector DB and exposes a REST API that the model can call to fetch relevant snippetsopenai.comopenai.com. In effect, this plugin provides a standard way for an AI to do semantic search over a business’s content. For example, a retailer could embed all product descriptions or manuals, and the retrieval plugin would let ChatGPT pull the most relevant text when a user asks a detailed product question. The data returned from the vector store is then presented by the model with full fidelity (often with citations). Similarly, frameworks like LangChain, LlamaIndex, and many others enable constructing a custom chatbot that sits on top of a company’s data. While these solutions are more about a business building its own AI assistant, they demonstrate that deterministic, structured querying of data by AI is feasible right now. A vector database essentially serves as a structured knowledge source (the structure being embedding vectors with an index) that the AI can query at runtime.
It’s worth noting that no single industry-wide protocol yet lets any AI agent automatically tap into a business’s internal database – it usually requires configuration or partnership. However, the use of open APIs and standards (OpenAI’s plugin spec, or even something like GraphQL endpoints exposed for AI) could converge toward a scenario where, say, a future web crawler or AI agent first checks if a site offers an AI interface (llms.txt, an AI sitemap, or an open plugin endpoint) and uses it to retrieve information in a structured way. We are already seeing early signs: for instance, Microsoft’s Bing has urged webmasters to ensure quality content and schema for AI, and even to use IndexNow to feed fresh changeslinkedin.com; OpenAI’s plugin ecosystem and Bing’s adoption of it hint at a common API layer for AI data access; and community proposals like llms.txt aim to standardize how raw web content can be streamlined for AI parsing.
Support for the User’s Vision: The user’s architectural vision – where businesses can “control their public-facing messaging” in structured, machine-readable form for LLMs, possibly via deterministic queries (REST APIs, vector DBs) – is partially in place but still evolving. Here’s the state of play:
Feasibility Today: In summary, the architectural model described is partially feasible today. A business serious about optimizing for AI can: (1) implement structured data markup everywhere and keep it updated (this directly feeds Google/Bing AI results)schemaapp.com; (2) publish an llms.txt
or other distilled content pages to assist any AI agent that looks for them; (3) consider offering an API or plugin that AI could use (even if usage is limited now, it positions the business for future integrations); and (4) use retrieval techniques to power their own AI assistants or customer-facing chatbots with authoritative data. What’s missing is a universal, hands-free way for third-party AI systems to automatically discover and use your structured messaging – we’re moving in that direction, but not all the pieces have clicked into place yet. Companies like OpenAI and Google are actively researching how to ground LLM responses in trusted data, so it’s reasonable to expect pilot programs soon where, say, retailers can feed product info directly to AI platforms. For now, businesses can pilot this vision by leveraging the tools at hand (e.g. a ChatGPT plugin tied to a live database, or partnering with Bing’s index via Merchant Center feeds and IndexNow). Early adopters of these methods are essentially helping define how businesses and AI will interact. Given the rapid developments in 2024–2025, it’s quite plausible that standards like llms.txt or more API-driven AI integrations could become mainstream in the near future – enabling companies to truly “optimize messaging” for AI just as SEO enabled optimizing for search engines.