Technical Readiness
Make sure your site is set up so AI crawlers can read and cite you — robots.txt, llms.txt, schema markup, and content structure.
Technical Readiness is the 15% component of your ModelScore that measures whether your site itself is set up so AI platforms can read and cite you. Get the basics right and AI crawlers index your content correctly. Get them wrong and you can be the most authoritative voice in your category and still be invisible to AI because the gates are closed.
The good news: most issues here are quick to fix once you know about them. Most operators move from a 30-50 Technical Readiness score to 70-90+ in less than a day’s work.
This article walks through every check CiteMetrix runs, what passing each one requires, and how to address common failures.
How CiteMetrix checks Technical Readiness
On every scan, CiteMetrix fetches three things from your domain:
https://yourdomain.com/robots.txt— your robots filehttps://yourdomain.com/llms.txt— your LLMs file (newer convention)https://yourdomain.com/— your homepage HTML, parsed for schema markup and heading structure
Each check produces points; points sum to a 0-100 Technical Readiness score that feeds into ModelScore.
The check happens automatically as part of normal scans — no separate setup required. Results show on the dashboard’s Technical Readiness panel, with each individual check pass/fail and an actionable explanation for any fail.
Robots.txt and AI crawlers
Your robots.txt tells crawlers what they’re allowed to access. Most existing robots files were written before AI crawlers existed, and many sites accidentally block them with a wildcard Disallow: /.
CiteMetrix specifically checks whether the following AI crawlers are allowed:
| Crawler user-agent | Operated by | What it does |
|---|---|---|
GPTBot | OpenAI | Crawls for ChatGPT training data |
ChatGPT-User | OpenAI | Used when ChatGPT users ask it to fetch URLs |
CCBot | Common Crawl | Open web crawl used by many AI training pipelines |
Google-Extended | Controls Gemini and Bard access (separate from regular Google search) | |
anthropic-ai / ClaudeBot | Anthropic | Crawls for Claude training and retrieval |
PerplexityBot | Perplexity | Crawls for Perplexity AI search |
For each crawler, CiteMetrix checks whether your robots.txt explicitly disallows it via a pattern like:
User-agent: GPTBot
Disallow: /
If that pattern is present, the crawler is blocked and you lose points. If the pattern is absent (or set to allow), the crawler can access your site.
Recommended robots.txt for AI visibility:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: CCBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Adjust the Disallow patterns at the top to match anything specific to your site you want to keep out of crawlers (typically admin paths, draft content, etc.). The per-AI-bot blocks above are explicit Allow: / to make the intent obvious — the equivalent is having no entry for those bots, which falls back to whatever your User-agent: * directive says.
llms.txt
llms.txt is an emerging convention modeled on sitemap.xml. Where a sitemap tells search engines what pages exist, llms.txt tells AI platforms what content is most important — your authoritative pages, your canonical sources, the URLs you’d want AI to prioritize when forming its understanding of your brand.
CiteMetrix checks whether https://yourdomain.com/llms.txt exists and returns a 200 response. Having one earns 20 of the 100 points for Technical Readiness.
Minimum viable llms.txt:
# Acme Inc.
> Acme makes asphalt rejuvenators and polymer-modified asphalt for North American highway projects. Founded 2014 in Austin, Texas.
## Core pages
- [About Acme](https://acme.com/about): Company overview, founding story, leadership
- [Products](https://acme.com/products): Full product line with technical specifications
- [Sustainability](https://acme.com/sustainability): Our environmental commitments and certifications
- [Contact](https://acme.com/contact): Sales and support contact info
## Documentation
- [Technical specifications](https://acme.com/specs): Detailed product specs
- [Application guides](https://acme.com/guides): How to use our products
The format is markdown. The first heading is your brand name. The blockquote (>) is a one-paragraph summary of who you are. Then markdown headings group your most-cited pages by category, with each link including a brief description.
Don’t overthink this. A short, clear llms.txt is more useful than a long, exhaustive one. AI platforms parse the file looking for “what’s the canonical signal about this brand” — a focused list of your 10-30 most important pages is the right shape.
Schema markup (structured data)
CiteMetrix checks your homepage for JSON-LD schema markup. Specifically, it looks for any <script type="application/ld+json"> block in the HTML. The presence of schema is worth 15 points.
The schema types that matter most for AI visibility are:
Organization— your basic company information (name, URL, logo, founder, foundingDate, address)WebSite— your site as a whole (with potentialSearchActionfor site search)Article— for individual content pages (less relevant on the homepage; matters more on blog/article pages)Product— if you sell specific productsFAQPage— for FAQ-style content; highly favored by AI for retrieval
If you’re on WordPress (as CiteMetrix is), schema generation is typically handled by your SEO plugin (Yoast, Rank Math, etc.). Most well-configured installations already have basic Organization and WebSite schema. If your check is failing, the issue is usually:
- An SEO plugin isn’t installed
- Schema is being generated but is malformed (validate at
validator.schema.orgto confirm) - A theme override is preventing the JSON-LD from rendering
Content structure
The third part of the check looks at the homepage HTML for basic content structure — specifically, the presence of meaningful heading tags (<h1> through <h6>).
CiteMetrix counts headings on the homepage. If there are 3 or more, the structure check passes (15 points). If fewer, it fails.
Why this matters: AI platforms favor pages with clear hierarchical structure. A homepage that’s mostly visual (a hero image, a few CTAs, no headings) might look beautiful but doesn’t give AI parsers much to work with. Adding semantic structure with proper heading tags is a low-effort win.
A few practical tips:
- One H1 per page, ideally containing your primary keyword or brand name
- H2s for major sections of the page
- H3s for subsections within those
- Avoid using headings purely for visual styling — if it shouldn’t be in the document outline, use a styled
<div>or<p>instead
What a perfect Technical Readiness score looks like
100 points requires:
- ✓
robots.txtis accessible (20 points) - ✓ All 6 AI crawlers explicitly allowed (30 points)
- ✓
llms.txtis present and returns 200 (20 points) - ✓ Homepage has JSON-LD schema markup (15 points)
- ✓ Homepage has 3+ heading tags (15 points)
Most well-managed marketing sites can achieve 85+ within a few hours of focused work. The hardest one for many operators is the AI crawler allow-list — older sites often have years of accumulated robots.txt rules that need to be reviewed and cleaned up.
What Technical Readiness doesn’t measure
Plenty of things matter for AI visibility that aren’t in the Technical Readiness score:
- Page speed — important for SEO generally, less important for AI crawler success specifically
- Mobile responsiveness — same
- HTTPS — assumed; the check only fetches HTTPS URLs
- Domain authority / backlinks — captured indirectly through Brand Demand and Authority Transfer
- Content depth and quality — the qualitative work that drives hallucinations remediation doesn’t count here
- Sitemap quality — we check for the sitemap reference in robots.txt but don’t deep-validate the sitemap itself
These are real factors. They just aren’t part of the 15% Technical Readiness slice. Improving them shows up elsewhere in your ModelScore (or in your overall AI citation rate).
Common mistakes
1. Using a “block-all” robots.txt template. Some site builders ship default robots files with broad wildcard blocks. These kill AI visibility instantly. If your Technical Readiness score is below 30 and the AI Crawler check shows everything blocked, this is the cause.
2. Hosting robots.txt in the wrong place. robots.txt must live at the root of your domain (yourdomain.com/robots.txt), not in a subdirectory. Subdirectory robots files don’t work.
3. Adding llms.txt as a regular page. llms.txt should be a plain text file at the root of your domain, served with Content-Type: text/plain (or text/markdown). If you create it as a WordPress page, the URL will likely have a different path and the file will return HTML, not the markdown the spec expects.
4. Schema markup that doesn’t validate. Run your homepage through validator.schema.org. Common issues: missing required fields (Organization without name, Article without author), structural errors (mismatched braces in the JSON), or stale schema referencing pages that no longer exist.
5. Headings used for typography only. If your H2s exist purely because the designer wanted big text, your document outline is broken. Use semantic heading levels for actual document structure; use CSS for visual hierarchy.
Next steps
- Open the Technical Readiness panel in your dashboard and review the per-check pass/fail list
- For any failed check, the panel includes a “Fix this” link with specifics
- Re-scan after each fix to confirm the score moves