Concepts

Citation Scans

How CiteMetrix queries AI platforms on your behalf, what gets stored, and how to control scan frequency.

A citation scan is the unit of work behind everything CiteMetrix shows you. Each scan asks one of your configured keywords to one of your configured AI platforms, captures the response, parses it for brand mentions and citations, and stores the result. Roll those results up over a 30-day window and you get the AI Citation component of your ModelScore. Filter to a single platform and you get that platform’s view of your brand. Diff against your brand facts and you get hallucination reports.

This article explains how scans actually run.

What gets scanned, and how often

For each AI platform you have an API key configured for, CiteMetrix runs every keyword in your queries list against that platform on a regular cadence. The cadence depends on your subscription:

PlanScans per day (across all domains)
Starter100
Professional300
Agency1,000

A “scan” here means one keyword × one platform × one execution. So if you have 20 keywords and 5 platforms configured, a complete sweep is 100 scans. On Starter, that’s your full daily budget; on Pro you’ve got room for three sweeps a day; on Agency, ten.

CiteMetrix paces scans automatically — it doesn’t blast through your daily quota in five minutes and then go silent. The scheduler distributes scans across the day so your data freshness is consistent, not spiky.

The three-lane cron architecture

Behind the scenes, CiteMetrix runs three independent cron lanes so that a slow operation in one category doesn’t block a fast one in another:

  • Fast lane — citation scans, accuracy checks, sentiment analysis, technical readiness checks. Three parallel workers, each timeout 55 seconds. This is the lane you care about for everyday data freshness.
  • Slow lane — competitor scans. Single worker, 180-second timeout. Competitor scans involve more queries (your competitors get scanned for the same keywords you do, plus their own), so they get their own lane to avoid blocking citation work.
  • Digest lane — weekly digest emails, monthly summaries, batch analysis. Runs every 5 minutes with a 240-second timeout.

If something goes wrong in one lane (a slow API response, a stuck job), the other lanes keep running. This is the kind of detail you don’t normally need to think about — but if you ever wonder why your competitor data is a few minutes behind your citation data, this is why.

What CiteMetrix stores per scan

For every scan, the following gets captured to the database:

  • The full prompt sent to the AI platform
  • The full response text
  • Whether your brand was cited (mentioned by name, by domain, or by close variant)
  • The sentence(s) where the citation appeared, if any
  • Sentiment classification of those sentences (positive, neutral, negative)
  • Any URLs cited in the response
  • Whether competitors were mentioned in the same response
  • Token usage and cost (so the Spend Monitor can track real costs)

This is what powers the dashboard’s drill-down: you can click any data point and see the actual scan that produced it. There’s no hidden math — the numbers come from rows in your database.

Forcing a fresh scan

Sometimes you want to bypass the schedule and run a scan right now — typically when:

  • You’ve just added a new keyword and want to see how AI responds before tomorrow’s scheduled sweep
  • You’ve published new content and want to see whether AI is picking it up
  • You’re investigating a specific hallucination and want to confirm it’s reproducible

From the dashboard, click Run Scan in the top-right. A confirmation modal will tell you how many keywords and platforms will be scanned, and the estimated total API call count. The results land in the database the same way scheduled scans do — the only difference is you triggered it manually.

Force-rescans count against your daily quota.

A note on AI cache behavior: Even after you publish new content, AI platforms don’t immediately know about it. Most platforms refresh their training or retrieval data on schedules ranging from days to months. A force-rescan tomorrow will likely return similar results to today. CiteMetrix can show you when that data finally shifts; it can’t make the shift happen faster.

Scan failures

Scans can fail for a few reasons:

  • API key issues — the key is invalid, has been revoked, or has hit a quota cap on the provider’s side
  • Platform downtime — temporary outages at the AI platform itself
  • Rate limiting — usually transient, the next attempt usually succeeds
  • Timeout — the AI platform took too long to respond

CiteMetrix retries failed scans automatically with exponential backoff. If a scan fails three times in a row, it’s logged as a permanent failure and you’ll get a notification (email, push if you have the PWA, or both depending on settings). The dashboard’s Alerts feed shows recent scan failures with the underlying error so you can diagnose API key problems quickly.

Scan history

Every scan you’ve ever run is preserved indefinitely (subject to data retention policy — see your Account Settings). The dashboard shows the most recent results by default, but you can navigate to any historical date to see what AI was saying about you that day. Useful for:

  • Tracking how a specific hallucination got fixed (or didn’t) over time
  • Showing a stakeholder before/after of your visibility on a specific platform
  • Diagnosing a sudden ModelScore drop — find the day it dropped, look at the scans that ran that day, see what changed

What scans don’t do

A few things people often expect citation scans to do that they don’t:

  • They don’t crawl your website. Scans query AI platforms, not your site. The Technical Readiness check crawls your site separately (robots.txt, llms.txt, schema markup). Different system.
  • They don’t search Google or Bing. Citation scans hit AI platforms (ChatGPT, Claude, Gemini, etc.). Search engine results are not what’s being measured. If you connect Google Search Console, CiteMetrix uses that data for the Brand Demand component — but GSC data and AI citation data are different things.
  • They don’t include images, video, or audio. Text responses only. Multi-modal AI is on the roadmap but isn’t part of citation scans today.

Next steps

  • Add or refine your keywords — better keywords produce more useful scans.
  • Connect API keys for platforms you don’t yet have configured — broader platform coverage means a more representative ModelScore.
  • Review the Hallucinations & Remediation workflow — it’s the layer of CiteMetrix that turns raw scan data into action items.
Last updated: May 7, 2026 Suggest an edit ›