Why We Rebuilt Our Scan Engine From Scratch — and What It Says About Trusting an AI-Visibility Tool

By Eric Richmond, Founder, CiteMetrix

Most companies don’t publish how their analytics actually work. The methodology is a black box, the score is a number you’re asked to trust, and the architecture behind it is whatever the marketing team felt like describing.

I’ve decided to go the other way with CiteMetrix. We just published our full technical methodology — including the parts that aren’t finished yet — and I want to use this post to explain one of the bigger engineering decisions behind it: why we tore out our original scan engine and rebuilt it on a completely different architecture.

It’s a story that starts with a bug I found in my own beta.

The bug I caught on my own dime

In the beta, every scan ran through keys I controlled. Beta testers weren’t entering their own API keys — they were running on mine. That detail matters for what comes next, so I’ll be precise about it: no customer was ever overcharged, because no customer’s keys were ever in play. The only account exposed to this was mine.

Here’s what I found. The original scan engine ran inside the WordPress application, coordinated by background jobs. Under certain conditions — multiple workers running in parallel, a slow scan, a timing window where two workers each believed a given query still needed to run — the same query could be sent to the same AI platform more than once. Each send was a real, paid API call. The data wasn’t duplicated in a way you’d notice in the dashboard; the cost was duplicated in a way you’d only notice on an API bill.

I noticed on the API bill.

Because it was my bill, in my beta, the lesson was cheap. But it taught me something I couldn’t un-see: the architecture itself made that failure possible. The duplicate-call problem wasn’t a typo I could patch — it was a structural property of running a job coordinator inside a web application that was never designed to be one. I could keep patching symptoms, or I could fix the architecture so the failure mode couldn’t exist.

I chose to rebuild.

What “rebuild it so the bug is impossible” actually means

The redesign is built on one principle: a scan should cost exactly one paid API call per query-and-platform combination — and the system should make any other outcome structurally impossible, not merely unlikely.

Here’s how the new pipeline works, in plain terms:

One work unit = one API call. When a scan runs, it’s broken into individual work units — one for each (query × platform) pair. Each unit is handled independently.

Every work unit carries a unique key. Before any work unit can create a paid API call, the system checks a database constraint that enforces uniqueness on that key. If the same unit is ever delivered twice — which is a normal, expected event in any distributed queue — the second attempt simply cannot create a second call. The database refuses it.

That’s the whole trick, and it’s deliberately boring. Duplicate billing isn’t guarded against with careful code that I have to keep getting right. It’s made impossible by a constraint the database enforces whether my code is perfect or not. The difference between “we’re careful” and “it can’t happen” is the difference between hoping and knowing.

Capture first, interpret second. The new engine also separates the expensive part from the smart part. One stage makes the API call and stores the complete raw response, untouched. A second stage reads that stored response and derives the analysis — whether the brand was cited, the sentiment, the position, and so on. They’re decoupled by a queue.

This sounds like a minor implementation detail. It isn’t. It means that when I improve the analysis logic, I can re-run it against responses I already paid for — without paying for them again. The raw data is captured once and kept. Better methodology later doesn’t cost another round of API spend.

Failures stay contained. If one platform times out, or one query misbehaves, it affects only that one work unit. The other platforms for that query, and every other query, are untouched. A unit that fails transiently is retried; one that fails repeatedly is set aside for inspection rather than lost or retried forever.

Why this is different from how most of the market builds

I’m not going to name competitors, partly because I can’t verify their internals from the outside and I’m not going to characterize architecture I haven’t seen. But I can describe the common pattern, because it’s the pattern I started with.

Most visibility tools run their scanning inside a single application — the same app that serves the dashboard, handles logins, and renders reports. It’s the fastest way to ship a first version. I know, because that’s exactly what I did.

The problem is that scanning is a fundamentally different kind of work from serving a web page. A web request should take milliseconds. A scan across nine AI platforms for dozens of queries takes minutes, makes dozens of paid external calls, and needs to survive partial failures, retries, and rate limits. Cramming that into a web application’s request-and-job model is how you get the exact class of bug I described — and a dozen others waiting behind it.

Moving the scan engine out into a purpose-built, queue-decoupled pipeline isn’t architectural vanity. It’s matching the tool to the job. The dashboard does what dashboards are good at. The pipeline does what pipelines are good at. Neither pretends to be the other.

The part I care about most: you can audit any of this

Here’s why I’m telling you all of this instead of just telling you our score is trustworthy.

A visibility score is only as credible as the method behind it. If I ask you to trust a number, I owe you the ability to check how it’s produced. So our methodology is published — the four components of our composite score, the exact weights, the data source behind each one, and crucially, the limitations. We say plainly what the tool measures and what it doesn’t. We say plainly that our accuracy engine uses an AI model to evaluate AI responses, and that findings are surfaced for human review. We say plainly which security certifications are documented versus formally audited.

That last part is the test. Anyone can publish the flattering parts of their methodology. The signal of an honest tool is whether it also publishes the unflattering parts — the boundaries, the dependencies, the things still on the roadmap. We do, on purpose.

If you’re evaluating any AI-visibility tool — ours or anyone’s — that’s the question I’d put at the top of your list: will they show you how the number is made, including where it’s weak? If the answer is no, the number is a marketing asset, not a measurement.

We’ll show you ours.

You can read the full CiteMetrix technical methodology here: https://citemetrix.com/technical-methodology-data-collection-architecture/. It covers the scan architecture, our nine-platform coverage, how ModelScore is computed, our Brand Facts accuracy engine, our BYOK key-isolation model, and an explicit section on scope and limitations. If you have technical questions or want a working demonstration, reach me at eric@citemetrix.com.

Why We Rebuilt Our Scan Engine From Scratch — and What It Says About Trusting an AI-Visibility Tool

The bug I caught on my own dime

What “rebuild it so the bug is impossible” actually means

Why this is different from how most of the market builds

The part I care about most: you can audit any of this

Eric Richmond

See What AI Says About Your Brand

More from the Blog

Why Your Brand Looks Different in AI Search Than in Google : And What to Do About It