Reference

Technical Readiness

Make sure your site is set up so AI crawlers can read and cite you — robots.txt, llms.txt, schema markup, and content structure.

Technical Readiness is the 15% component of your ModelScore that measures whether your site itself is set up so AI platforms can read and cite you. Get the basics right and AI crawlers index your content correctly. Get them wrong and you can be the most authoritative voice in your category and still be invisible to AI because the gates are closed.

The good news: most issues here are quick to fix once you know about them. Most operators move from a 30-50 Technical Readiness score to 70-90+ in less than a day’s work.

This article walks through every check CiteMetrix runs, what passing each one requires, and how to address common failures.

How CiteMetrix checks Technical Readiness

On every scan, CiteMetrix fetches three things from your domain:

https://yourdomain.com/robots.txt — your robots file
https://yourdomain.com/llms.txt — your LLMs file (newer convention)
https://yourdomain.com/ — your homepage HTML, parsed for schema markup and heading structure

Each check produces points; points sum to a 0-100 Technical Readiness score that feeds into ModelScore.

The check happens automatically as part of normal scans — no separate setup required. Results show on the dashboard’s Technical Readiness panel, with each individual check pass/fail and an actionable explanation for any fail.

Robots.txt and AI crawlers

Your robots.txt tells crawlers what they’re allowed to access. Most existing robots files were written before AI crawlers existed, and many sites accidentally block them with a wildcard Disallow: /.

CiteMetrix specifically checks whether the following AI crawlers are allowed:

Crawler user-agent	Operated by	What it does
`GPTBot`	OpenAI	Crawls for ChatGPT training data
`ChatGPT-User`	OpenAI	Used when ChatGPT users ask it to fetch URLs
`CCBot`	Common Crawl	Open web crawl used by many AI training pipelines
`Google-Extended`	Google	Controls Gemini and Bard access (separate from regular Google search)
`anthropic-ai` / `ClaudeBot`	Anthropic	Crawls for Claude training and retrieval
`PerplexityBot`	Perplexity	Crawls for Perplexity AI search

For each crawler, CiteMetrix checks whether your robots.txt explicitly disallows it via a pattern like:

User-agent: GPTBot
Disallow: /

If that pattern is present, the crawler is blocked and you lose points. If the pattern is absent (or set to allow), the crawler can access your site.

Recommended robots.txt for AI visibility:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: CCBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Adjust the Disallow patterns at the top to match anything specific to your site you want to keep out of crawlers (typically admin paths, draft content, etc.). The per-AI-bot blocks above are explicit Allow: / to make the intent obvious — the equivalent is having no entry for those bots, which falls back to whatever your User-agent: * directive says.

llms.txt

llms.txt is an emerging convention modeled on sitemap.xml. Where a sitemap tells search engines what pages exist, llms.txt tells AI platforms what content is most important — your authoritative pages, your canonical sources, the URLs you’d want AI to prioritize when forming its understanding of your brand.

CiteMetrix checks whether https://yourdomain.com/llms.txt exists and returns a 200 response. Having one earns 20 of the 100 points for Technical Readiness.

Minimum viable llms.txt:

# Acme Inc.

> Acme makes asphalt rejuvenators and polymer-modified asphalt for North American highway projects. Founded 2014 in Austin, Texas.

## Core pages

- [About Acme](https://acme.com/about): Company overview, founding story, leadership
- [Products](https://acme.com/products): Full product line with technical specifications
- [Sustainability](https://acme.com/sustainability): Our environmental commitments and certifications
- [Contact](https://acme.com/contact): Sales and support contact info

## Documentation

- [Technical specifications](https://acme.com/specs): Detailed product specs
- [Application guides](https://acme.com/guides): How to use our products

The format is markdown. The first heading is your brand name. The blockquote (>) is a one-paragraph summary of who you are. Then markdown headings group your most-cited pages by category, with each link including a brief description.

Don’t overthink this. A short, clear llms.txt is more useful than a long, exhaustive one. AI platforms parse the file looking for “what’s the canonical signal about this brand” — a focused list of your 10-30 most important pages is the right shape.

Schema markup (structured data)

CiteMetrix checks your homepage for JSON-LD schema markup. Specifically, it looks for any <script type="application/ld+json"> block in the HTML. The presence of schema is worth 15 points.

The schema types that matter most for AI visibility are:

Organization — your basic company information (name, URL, logo, founder, foundingDate, address)
WebSite — your site as a whole (with potential SearchAction for site search)
Article — for individual content pages (less relevant on the homepage; matters more on blog/article pages)
Product — if you sell specific products
FAQPage — for FAQ-style content; highly favored by AI for retrieval

If you’re on WordPress (as CiteMetrix is), schema generation is typically handled by your SEO plugin (Yoast, Rank Math, etc.). Most well-configured installations already have basic Organization and WebSite schema. If your check is failing, the issue is usually:

An SEO plugin isn’t installed
Schema is being generated but is malformed (validate at validator.schema.org to confirm)
A theme override is preventing the JSON-LD from rendering

Content structure

The third part of the check looks at the homepage HTML for basic content structure — specifically, the presence of meaningful heading tags (<h1> through <h6>).

CiteMetrix counts headings on the homepage. If there are 3 or more, the structure check passes (15 points). If fewer, it fails.

Why this matters: AI platforms favor pages with clear hierarchical structure. A homepage that’s mostly visual (a hero image, a few CTAs, no headings) might look beautiful but doesn’t give AI parsers much to work with. Adding semantic structure with proper heading tags is a low-effort win.

A few practical tips:

One H1 per page, ideally containing your primary keyword or brand name
H2s for major sections of the page
H3s for subsections within those
Avoid using headings purely for visual styling — if it shouldn’t be in the document outline, use a styled <div> or <p> instead

What a perfect Technical Readiness score looks like

100 points requires:

✓ robots.txt is accessible (20 points)
✓ All 6 AI crawlers explicitly allowed (30 points)
✓ llms.txt is present and returns 200 (20 points)
✓ Homepage has JSON-LD schema markup (15 points)
✓ Homepage has 3+ heading tags (15 points)

Most well-managed marketing sites can achieve 85+ within a few hours of focused work. The hardest one for many operators is the AI crawler allow-list — older sites often have years of accumulated robots.txt rules that need to be reviewed and cleaned up.

What Technical Readiness doesn’t measure

Plenty of things matter for AI visibility that aren’t in the Technical Readiness score:

Page speed — important for SEO generally, less important for AI crawler success specifically
Mobile responsiveness — same
HTTPS — assumed; the check only fetches HTTPS URLs
Domain authority / backlinks — captured indirectly through Brand Demand and Authority Transfer
Content depth and quality — the qualitative work that drives hallucinations remediation doesn’t count here
Sitemap quality — we check for the sitemap reference in robots.txt but don’t deep-validate the sitemap itself

These are real factors. They just aren’t part of the 15% Technical Readiness slice. Improving them shows up elsewhere in your ModelScore (or in your overall AI citation rate).

Common mistakes

1. Using a “block-all” robots.txt template. Some site builders ship default robots files with broad wildcard blocks. These kill AI visibility instantly. If your Technical Readiness score is below 30 and the AI Crawler check shows everything blocked, this is the cause.

2. Hosting robots.txt in the wrong place. robots.txt must live at the root of your domain (yourdomain.com/robots.txt), not in a subdirectory. Subdirectory robots files don’t work.

3. Adding llms.txt as a regular page. llms.txt should be a plain text file at the root of your domain, served with Content-Type: text/plain (or text/markdown). If you create it as a WordPress page, the URL will likely have a different path and the file will return HTML, not the markdown the spec expects.

4. Schema markup that doesn’t validate. Run your homepage through validator.schema.org. Common issues: missing required fields (Organization without name, Article without author), structural errors (mismatched braces in the JSON), or stale schema referencing pages that no longer exist.

5. Headings used for typography only. If your H2s exist purely because the designer wanted big text, your document outline is broken. Use semantic heading levels for actual document structure; use CSS for visual hierarchy.

Next steps

Open the Technical Readiness panel in your dashboard and review the per-check pass/fail list
For any failed check, the panel includes a “Fix this” link with specifics
Re-scan after each fix to confirm the score moves

Last updated: May 7, 2026 Suggest an edit ›