llms.txt is a simple Markdown file you place at the root of your domain at `/llms.txt`. It gives AI models like ChatGPT, Claude, and Perplexity a curated map of your most important content: a short description of your site plus an ordered list of links to the documents an AI system should read. The proposal comes from Australian developer Jeremy Howard (co-founder of fast.ai and Answer.AI) and was published in September 2024.
The core idea: AI systems have narrow context windows and struggle to process navigation-heavy HTML pages full of menus, scripts, and ads. Instead, llms.txt serves clean, pre-digested Markdown at a predictable location. This guide explains exactly what the file is, why it increasingly matters for AI search, how to create one in clear steps, and how `llms-full.txt` differs from it.
What is llms.txt?
llms.txt is a standardized Markdown file at `https://your-domain.com/llms.txt` that shows a large language model at a glance which of your content matters most and where it lives. It is deliberately both machine- and human-readable: no complex syntax, just Markdown. That makes it fundamentally different from `robots.txt`, which steers crawlers in its own format, and from `sitemap.xml`, which lists every URL in XML.
The format follows a fixed structure. It starts with an H1 carrying your project name. Below it sits a short blockquote that explains in one or two sentences what the site is about. Optional detail sections follow, then one or more H2 sections containing link lists in the form `[Title](URL): short note`. An optional `## Optional` section marks content a model may skip when the context window gets tight.
An important clarification: llms.txt is a community proposal, not an official standard from a body like IETF or W3C. It is a convention that spreads through adoption — much like `robots.txt` did in its early years.
Where the proposal comes from
Jeremy Howard, who published the proposal on llmstxt.org in September 2024, is no stranger to the field: he is regarded as one of the most influential practitioners in applied deep learning and built fast.ai, one of the most widely used machine-learning learning communities. His starting problem was concrete and practical. While building AI tools for his own documentation, his team found that models failed on HTML pages — not because the content was missing, but because it was buried under layout, scripts, and navigation.
The solution was deliberately minimal. Rather than inventing a new, complex protocol, Howard reached for Markdown, which language models understand excellently because a large share of their training data is in that format. The choice of the filename `llms.txt` and its fixed position at the domain root was a deliberate nod to established conventions like `robots.txt` — familiar, predictable, and with no learning curve for developers. That very simplicity explains why the proposal spread so quickly.
What an llms.txt file looks like
A minimal, valid llms.txt has four building blocks. Here is a concrete example for a fictional documentation site:
``` # Example Project
PromptA tool to organize and version AI prompts.
Docs
- [Quickstart](https://example.com/docs/quickstart.md): Up and running in 5 minutes
- [API Reference](https://example.com/docs/api.md): Every endpoint explained
Optional
- [Changelog](https://example.com/changelog.md): Version history ```
Three design principles sit behind this. First, curation: you select rather than dump everything — the opposite of a complete sitemap. Second, Markdown: each linked resource should ideally also exist as a `.md` version so the model gets clean text instead of HTML. Third, priority: order signals importance, and `## Optional` lets models drop the right things first when budget is tight.
llms.txt vs. robots.txt vs. sitemap.xml
The three files are often confused but serve different purposes. A quick comparison clears it up:
| File | Format | Purpose | Audience |
|---|---|---|---|
| robots.txt | Custom rule format | Allow/disallow crawling | Search and AI crawlers |
| sitemap.xml | XML | Full URL list for indexing | Search engine crawlers |
| llms.txt | Markdown | Curated content for LLM inference | AI language models |
The key distinction: `robots.txt` and `sitemap.xml` address crawlers that index your site. llms.txt addresses models that read your content at inference time to build it into an answer. We go deeper in [ai.txt vs. llms.txt vs. robots.txt](/magazin/ai-txt-vs-llms-txt-robots).
Why does llms.txt matter for AI search?
llms.txt matters because the way people find information is shifting: away from the classic list of blue links and toward AI-generated answers. When a model can read your content cleanly and cite it correctly, your odds of appearing in those very answers rise. llms.txt lowers the barrier by removing friction from the reading step.
The shift is measurable. According to Gartner (press release, February 2024), traditional search engine query volume is projected to drop by around 25 percent by 2026 as users migrate to AI chatbots and virtual agents. In parallel, OpenAI reports that ChatGPT reached over 800 million weekly active users by 2025 (OpenAI, 2025). These users ask questions whose answers are assembled from web content — and every source that is hard to read gets pulled in less often.
The context-window problem
The technical reason for llms.txt lies in the limited context window of language models. A model can only process a finite amount of text per request. Yet a typical HTML page is largely made up of navigation, scripts, tracking, cookie banners, and ads — ballast that burns budget without contributing to the answer.
llms.txt solves this by supplying pre-digested Markdown: only the relevant content, no scaffolding. That saves tokens, reduces the risk of the model "losing" the actual content, and raises the likelihood of a correct, well-cited answer. Instead of letting the model guess which part of your page counts, you tell it directly.
Who benefits most?
Not every website needs llms.txt equally. The biggest beneficiaries are sites whose value lies in structured, factual content:
- Software documentation: APIs, SDKs, and guides that AI coding assistants should reproduce correctly.
- Knowledge bases and help centers: support content that chatbots draw on for answers.
- Product and pricing pages: so AI answers represent your features and terms accurately.
- Magazines and expert blogs: content meant to appear as a citable source in AI answers.
Pure marketing landing pages with little substance benefit less. The common thread: if you want AI to reproduce your facts correctly, you help it find those facts cleanly.
An honest look at adoption
llms.txt is no guarantee of visibility, and that deserves to be said plainly. As of 2026, no major provider — neither OpenAI, Anthropic, nor Google — has publicly confirmed that its models actively read llms.txt to generate answers. Google's John Mueller was openly skeptical in 2025, comparing the file to the largely ignored `keywords` meta tag.
Even so, the convention has gained momentum: providers like Anthropic, Stripe, Cloudflare, and many dev-tool companies publish llms.txt files, and directories such as directory.llmstxt.cloud list thousands of them. The pragmatic take for 2026: low effort, no harm, potential upside as more tools start using the file — a classic no-regret move within a broader [Generative Engine Optimization](/magazin/generative-engine-optimization-guide) strategy.
How AI systems actually use the file today
Even though the major model providers do not automatically fetch llms.txt when answering every question, the file is already being used concretely — just in places many people do not expect. The most common real use case is agentic tools and IDE assistants: when a developer loads a docs URL into Cursor, a coding agent, or a custom GPT, these tools can specifically look for `/llms.txt` to grab the clean Markdown version instead of the HTML scaffolding.
A second use case is manual prompting: users paste the contents of an `llms-full.txt` directly into a model with a large context window to ask questions about an entire documentation set. Third, RAG pipelines (retrieval-augmented generation) increasingly treat llms.txt as a preferred, pre-cleaned source. The common thread: the file pays off most today wherever a human or an agent deliberately decides to pull in your content — and finds it cleanly available at exactly that moment.
How do you create an llms.txt file?
You create an llms.txt in five steps: select your important content, write the file in the standard format, provide optional Markdown versions of the target pages, place the file at your domain root, and test the result. For a smaller site the whole process takes under an hour and needs no special software.
Here are the steps in detail:
1. Curate content. List the 10 to 30 most important URLs on your site: docs, core products, pricing, key guides. For each, ask: "Should an AI be able to reproduce this correctly?" 2. Write the file. Start with `# Project Name`, then a blockquote summary, then H2 sections with link lists in the form `[Title](URL): note`. 3. Provide Markdown versions. Ideally create a `.md` version of each target page (e.g. `page.html` and `page.html.md`) so models receive clean text. 4. Place it at the root. Save the file as `llms.txt` at the root, reachable at `https://your-domain.com/llms.txt` with content type `text/plain` or `text/markdown`. 5. Test. Open the URL in a browser, verify the Markdown syntax, and give an AI model the link with a request to summarize your site.
Curating the content correctly
The most important and most underrated step is selection. llms.txt is not a dumping ground for all URLs — that would be a sitemap. You make an editorial decision: what must an AI know about you to represent you correctly? Keep the list focused. Ten perfectly chosen links are worth more than two hundred random ones.
Structure thematically via H2 sections: for example `## Docs`, `## Product`, `## Company`. Add a short, precise note after the colon on each link — it helps the model decide whether the link fits the question. Anything nice-to-have but non-essential belongs under `## Optional`. This gives the model a clear priority order for when not everything fits in the context window.
Tools that help
You do not have to write llms.txt by hand. A growing tool landscape emerged in 2025:
- firecrawl.dev/llmstxt crawls a domain and generates both `llms.txt` and `llms-full.txt` automatically.
- llmstxt.firecrawl.dev and similar generators produce a draft from your sitemap.
- CMS plugins for WordPress, Webflow, and other systems increasingly offer automatic llms.txt output.
- Static site generators like Docusaurus, VitePress, or Mintlify can emit `.md` mirror versions of your pages at build time.
One important warning: generated files are a starting point, not a finished result. A tool does not know your priorities. Always review and trim the draft by hand, or you end up back at an unfiltered link list — exactly what llms.txt is meant to avoid. If you maintain several discovery files in parallel, find the boundaries in [ai.txt vs. llms.txt vs. robots.txt](/magazin/ai-txt-vs-llms-txt-robots).
Verify, serve, and keep it current
After writing comes the part most people underestimate: correct serving and maintenance. First verify the file is actually reachable at `/llms.txt` and served with the right content type — `text/plain` or `text/markdown`. A common mistake is that the server returns the file as HTML or with a 404, because it sits in the wrong directory or a routing rule intercepts it.
A practical end-to-end test beats any theory: give a model like Claude or ChatGPT the link to your `llms.txt` and ask it to summarize your website in five sentences. If the result is accurate, the file works. If it is not, you immediately see which content is missing or misread. Also plan to maintain the file with every major content update — ideally automated in your build process so it never goes stale. An outdated llms.txt is worse than none, because it actively feeds AI systems wrong facts.
What goes in llms-full.txt?
`llms-full.txt` contains the entire content of your important pages as a single, long Markdown document — not just links, but the full written-out text. Where `llms.txt` is a compact map with references, `llms-full.txt` is the complete book: a model can ingest it in one pass without following each link individually.
The difference is fundamental and tied to the use case. `llms.txt` suits situations where a model should navigate selectively and context is scarce. `llms-full.txt` suits a model with a large context window that should absorb your whole documentation at once — such as a coding assistant that wants your entire API reference in memory. Anthropic's own docs publish both variants; their `llms-full.txt` runs to hundreds of thousands of tokens.
When to use which file?
The choice depends on the size and purpose of your content. A simple rule of thumb:
| Criterion | llms.txt | llms-full.txt |
|---|---|---|
| Content | Links + short notes | Complete full text |
| Size | Small (a few KB) | Large (often >100 KB) |
| Best case | Selective navigation | Whole docs at once |
| Context need | Low | High (large window required) |
In practice the two are not mutually exclusive. Many providers publish both: `llms.txt` as a curated entry point and `llms-full.txt` as the complete knowledge base. Generators like Firecrawl produce both in parallel too. For smaller sites `llms.txt` alone often suffices; once you have extensive documentation, the full-text variant becomes worthwhile as an addition.
A useful pattern from the field: link your `llms-full.txt` as an optional entry inside your `llms.txt`. That way a model finds the compact map first and can reach for the full text only when needed. But watch the size — an `llms-full.txt` of several hundred thousand tokens will not fit every context window, and so should never count as mandatory content of the curated file, but always as a supplementary offering for models that have the room.
Avoiding common mistakes
The same mistakes keep appearing when creating both files. Avoid these:
- Dumping HTML instead of Markdown. llms-full.txt should be clean Markdown, not copied HTML with tags and scripts.
- Freezing stale content. The file must be updated when content changes, or the AI will cite outdated facts. Automate generation in your build.
- Accidentally exposing sensitive data. Everything in the file is publicly readable. Do not include internal URLs, tokens, or drafts.
- Confusing the file with the sitemap. llms-full.txt is content, not a URL list — and `llms.txt` is curation, not completeness.
Treat both files as part of your content, not as a one-time configuration. Kept current and honestly curated, they are a cheap building block of your AI visibility — embedded in a broader [GEO strategy](/magazin/generative-engine-optimization-guide).
Conclusion
llms.txt is a curated Markdown map of your website for AI models, placed at `/llms.txt`. It solves a concrete problem — limited context windows and unreadable HTML — by serving pre-digested, prioritized content at a predictable location. `llms-full.txt` extends that with the complete full text for models with large context.
Whether the major providers will read the file at scale in 2026 remains open. But the effort is small, the risk practically zero, and once more tools rely on it, you are prepared. Create a curated llms.txt, keep it current, and treat it as a no-regret building block of a deliberate AI visibility strategy.
You might also like
Generative Engine Optimization (GEO): The Complete Guide
Generative Engine Optimization (GEO) makes your content citable for ChatGPT, Perplexity, Gemini, and Google AI Overviews. The complete guide: definition, how it differs from SEO, citation strategies, llms.txt, and measurement.
ai.txt vs llms.txt vs robots.txt for AI Crawlers
ai.txt, llms.txt, and robots.txt all govern how AI crawlers treat your site — but each file does a different job. The full comparison: what controls access, what controls comprehension, and how to set up all three correctly.
Prompt Engineering Fundamentals
Prompt engineering from the ground up: building blocks, techniques, iteration, and the most common mistakes. The complete 2026 guide to reliable AI output.
