Methodology

How We Rank ATS Platforms.

Every score on AI Consensus Index is the product of a structured, repeatable process — four AI models, one standardised prompt, nine scored dimensions, and a human editorial layer that verifies facts without touching numbers. This page documents that process in full. If you are evaluating whether to trust our rankings, this is where to start.

📅 Methodology Version: March 2026

🤖 4 AI Models Used

📊 9 Scored Dimensions

✍️ Human-Verified, Not Human-Scored

1 The Evaluation Process at a Glance

Our methodology was designed around a single constraint: the scores had to be structurally protected from commercial influence. That meant the people responsible for editorial content could not be the same people producing the scores. The solution was to make the scoring machine-generated and the process transparent enough that any reader could audit whether it had been followed.

The process runs in five sequential steps. Each step is described in detail in the sections below.

Platform Selection

Platforms are selected for inclusion based on market relevance, search visibility, and geographic applicability to our target audience. Commercial relationships play no role in selection decisions.

Prompt Submission

A standardised evaluation prompt is submitted independently to each of the four AI models. Models receive identical instructions and are not shown each other's outputs at any stage.

Score Recording

Dimension scores from each model are recorded exactly as produced. No rounding, normalisation, or adjustment is applied. The Consensus Score is the arithmetic mean of all four model outputs across the nine dimensions.

Editorial Verification

Human editors cross-reference the AI-generated product descriptions against publicly available documentation — vendor websites, pricing pages, compliance certifications, and third-party review data — to identify and correct material factual errors. Scores are not part of this review.

Publication and Maintenance

Reviews are published with full model score breakdowns visible. The index is re-evaluated on a rolling cycle. When a platform undergoes significant product, pricing, or compliance changes, its evaluation is queued for re-run.

2 The Evaluation Prompt

The prompt below is submitted verbatim to each AI model for every platform evaluation. The only variable that changes between evaluations is the product name in the final line. Nothing else is altered — not the framing, not the dimension order, not the instructions on scoring distribution. This is the complete prompt as used in the March 2026 index cycle.

Why we publish the prompt

Publishing the prompt is not standard practice in the review industry. We do it because it is the most direct way to demonstrate that no vendor receives preferential framing. Any reader can submit this prompt to any of the four models themselves and compare the output to what we have published. Discrepancies are grounds for a legitimate correction request.

Evaluation Prompt — March 2026

Verbatim

You are a senior HR technology research analyst writing an independent review for the AI Consensus Index, a multi-model AI evaluation platform. Your review will be published alongside reviews from other AI models and compared directly. Write with authority, precision, and without marketing language. Be willing to criticize where warranted.

Base the analysis on publicly available information about the product, industry knowledge, and typical ATS capabilities. If specific details are uncertain, state the assumption rather than inventing features.

Review this ATS product: [Product Name]

Use exactly this structure:

OVERVIEW
A 3–4 sentence introduction covering what this ATS is, who makes it, and what market segment it targets.

BEST FOR
One clear sentence. Who is the ideal user or company for this product?

PRICING SUMMARY
Summarize the pricing tiers, contract requirements, and overall value positioning. Note any pricing transparency issues.

STANDOUT FEATURES
3 to 5 features that genuinely differentiate this product from competitors. Be specific, not generic.

SCORED DIMENSIONS
Score each dimension out of 10 with 2–3 sentences of justification per score. Do not round all scores to similar numbers — differentiate clearly based on actual product strengths and weaknesses.

Ease of Use: X/10
AI & Automation Features: X/10
Integrations: X/10
Pricing & Value: X/10
Customer Support: X/10
Scalability: X/10
Reporting & Analytics: X/10
Compliance: X/10
Performance / Time to Hire Impact: X/10

OVERALL SCORE: X/10
The arithmetic mean of the 9 dimension scores above.

PROS
4 to 6 bullet points. Specific and evidence-based, not generic praise.

CONS
3 to 5 bullet points. Be direct. Do not soften legitimate weaknesses.

VERDICT
A 4–5 sentence closing recommendation. State clearly who should buy this, who should avoid it, and whether the product represents good value in the current ATS market. End with one sentence on its outlook for 2026 and beyond.

3 The Nine Scored Dimensions

The dimensions were chosen to reflect the decision criteria most relevant to our target buyer: HR Directors, founders, and operational leads at startups and SMBs making a first or second ATS purchase, often without a dedicated procurement function. Each dimension is described below so that readers understand what is and is not being measured.

Dimension 01

Ease of Use

Onboarding time, interface clarity, and realistic learning curve for non-technical HR staff and occasional hiring manager users — not just recruiter power users.

Dimension 02

AI & Automation

Native, production-ready AI capabilities: resume screening, candidate scoring, workflow automation, generative features. Roadmap announcements and beta features do not count toward this score.

Dimension 03

Integrations

Breadth and documented reliability of native integrations with HRIS, background screening, video interviewing, job distribution platforms, and payroll systems.

Dimension 04

Pricing & Value

Transparency of published pricing, total cost of ownership at realistic deployment scale, presence of hidden add-on costs, and overall value relative to the feature set delivered.

Dimension 05

Customer Support

Channel availability, response quality at standard contract tiers (not just enterprise), and access to dedicated success resources without premium uplift requirements.

Dimension 06

Scalability

The platform's documented ability to handle growth in headcount, requisition volume, multi-location structures, and organisational complexity without degradation or migration forcing.

Dimension 07

Reporting & Analytics

Depth of native pipeline analytics, source attribution, custom reporting capability, and access to DEI or workforce intelligence data without requiring a separate BI tool.

Dimension 08

Compliance

GDPR, SOC 2, OFCCP, CCPA, and regional regulatory framework support. Weight is given to automation of compliance workflows rather than the existence of manual tooling.

Dimension 09

Performance / Time to Hire

Documented or credibly attributed impact on hiring cycle duration, coordinator overhead, and candidate drop-off rates. Vendor-supplied statistics without third-party corroboration are treated with scepticism.

On score distribution

The prompt explicitly instructs models not to cluster scores. A platform that is strong across most dimensions but has a material weakness in one area should receive a low score in that dimension — not a softened 7.0. Readers can compare dimension scores to identify a platform's specific strengths and weaknesses, not just its overall position.

4 The Four AI Models

Model selection was governed by two criteria: public availability at the time of evaluation, and demonstrated capability on structured analytical tasks. The four models used in the current index cycle are listed below. Each brings a different weighting toward different types of evidence, which is part of why aggregation across four models produces a more reliable output than any single model alone.

Gemini

Google DeepMind

Tends toward balanced, well-sourced assessments with particular depth on enterprise software market positioning and integration ecosystem breadth.

Grok

xAI

Produces direct, opinionated assessments with a tendency to weight user experience friction and pricing transparency heavily in its scoring.

ChatGPT

OpenAI

Provides structured, methodical scoring with reliable coverage of compliance requirements, integration depth, and enterprise-tier feature comparisons.

Claude

Anthropic

Applies a cautious, evidence-anchored approach. Consistently produces the most conservative scores in the consensus set — scores we have found to be the most reliably calibrated over time.

Why four models and not one

No single AI model has complete or perfectly balanced knowledge of the ATS market. Each reflects the distribution of information available in its training data. Aggregating across four independent models reduces the impact of any single model's blind spots, overconfidence, or training data gaps. Where all four models agree, the score is robust. Where they diverge significantly, that variance is itself informative — it usually reflects genuine ambiguity about a platform's positioning or a recent product change that some models have more exposure to than others.

5 How Scores Are Calculated

The Consensus Score for each platform is the arithmetic mean of the four individual model scores, which are themselves each the arithmetic mean of that model's nine dimension scores. The calculation is straightforward by design — no dimension is weighted above any other, and no model's output is weighted above any other.

Dimensions scored per model

Models evaluated per platform

Individual data points per platform

5–10

Possible score range

Scores are displayed to two decimal places. No rounding to the nearest half-point is applied. A platform scoring 7.87 scores 7.87 — not 7.9 or 8.0. This precision matters when comparing platforms whose consensus scores sit close together in the rankings.

The index is sorted in descending order by Consensus Score. Where two platforms share an identical score to two decimal places, they are listed alphabetically. Rank positions are recalculated after every re-evaluation cycle.

6 The Role of Human Editorial Oversight

Human editorial oversight exists to protect the accuracy of factual claims — not to influence scores. The boundary between what editors can and cannot do is precise and structural.

Editors are responsible for:

Prompt integrity. Ensuring the evaluation prompt is applied consistently across all platforms and does not inadvertently advantage particular vendor categories through framing.
Factual verification. Cross-referencing AI-generated product descriptions against publicly available documentation — vendor websites, pricing pages, compliance certifications, and credible third-party review sources — to identify and flag material inaccuracies.
Contextual commentary. Writing the Overview, Best For, Pricing Summary, and Verdict sections that appear on review pages. These sections reflect editorial judgment and are clearly attributed as editorial content, not AI scoring output.
Index maintenance. Deciding which platforms to add or remove from the index based on market relevance and audience applicability, not commercial criteria.

Editors cannot:

Modify, round, re-weight, or otherwise alter the dimension scores or Consensus Score produced by the AI models.
Re-run an evaluation to obtain a higher score for a platform with a commercial relationship.
Suppress or delay publication of a low-scoring review at a vendor's request.
Apply subjective adjustments to compensate for perceived model error without re-running the full evaluation under a documented prompt revision.

The non-negotiable rule

If a score is considered incorrect — because of a factual error in the underlying data, a material product change since evaluation, or a documented prompt flaw — the correct response is to re-run the evaluation with an improved prompt and publish both the old and new scores with a change note. Manual adjustment of a published score is not permitted under any circumstance.

7 Re-Evaluation Cadence and Version Control

Scores are not permanent. The ATS market moves quickly — pricing structures change, AI features are added, compliance certifications lapse or are obtained, and acquisitions alter product trajectories. Our policy is to re-evaluate the full index on a rolling cycle and to queue individual platforms for early re-evaluation when a material change is confirmed.

Triggers for an out-of-cycle re-evaluation include:

Confirmed pricing tier changes that materially alter the value assessment.
Major product releases that add or remove features scored in the current evaluation.
Ownership or acquisition events that change the platform's strategic direction or support model.
Verified factual errors in the current review that affect the integrity of the scored output.
New compliance certifications or documented compliance failures that alter the compliance dimension score.

Each review page carries a "Reviewed" date in the subtitle. This date reflects the most recent evaluation cycle for that platform. The index page carries a separate "Updated" date that reflects the most recent full-index re-run.

8 Known Limitations of This Methodology

We document our methodology's limitations because we believe informed scepticism from readers makes the index more credible, not less.

Training data recency. AI models have knowledge cutoffs. For platforms that have undergone significant product changes close to or after a model's training cutoff, scores may not fully reflect the current product state. Editorial verification partially compensates for this, but does not eliminate it.
Regional coverage bias. All four models have stronger coverage of North American and European ATS vendors than of Asia-Pacific-native platforms. Scores for regional vendors such as Darwinbox, Keka, and Manatal may carry slightly higher uncertainty than scores for globally established platforms.
No hands-on product testing. Our scores are based on publicly available information, not first-hand product trials. User experience dimensions in particular — ease of use, support quality — rely on the models' synthesis of publicly available review data, which has its own biases toward larger platforms with more review volume.
Equal dimension weighting. All nine dimensions contribute equally to the Consensus Score. A buyer who values compliance above all other dimensions should weight the compliance score accordingly in their own analysis rather than relying solely on the consensus average.

Our recommendation

Use the Consensus Score as an efficient first filter, not a final decision. Use the dimension scores to identify which platforms align with your specific priorities. Then request demonstrations, conduct your own reference checks, and negotiate commercial terms before committing. No ranking methodology — including ours — substitutes for direct vendor evaluation at the procurement stage.

9 Corrections Policy

We maintain a corrections policy because publishing at this scale produces errors, and how a publication handles errors is as much a trust signal as the errors themselves.

A correction is warranted when a published review contains a demonstrably incorrect statement of fact — a wrong pricing figure, a mischaracterised feature, an inaccurate compliance status — that can be verified against publicly available documentation. Corrections are processed as follows:

Minor factual corrections — errors in descriptive text that do not affect dimension scores — are updated in the review with a correction note and date.
Material factual errors — errors that, if corrected, would likely change one or more dimension scores — trigger a full re-evaluation of the affected platform under the standard prompt. Both the previous and updated scores are published with an explanation of what changed and why.
Disputed assessments — where a vendor disagrees with a score or verdict but cannot point to a factual error — are not actioned as corrections. We document the dispute and our reasoning, but we do not alter scores on the basis of vendor disagreement alone.

Correction requests can be submitted via the contact address in the site footer. We respond to all substantive correction requests within 10 business days and publish a decision either way.

← Back to Rankings