Methodology

How We Rank ATS Platforms.

Every score on AI Consensus Index is the product of a structured, repeatable process — four AI models, one standardised prompt, nine scored dimensions, and a human editorial layer that verifies facts without touching numbers. This page documents that process in full. If you are evaluating whether to trust our rankings, this is where to start.

📅 Methodology Version: March 2026
🤖 4 AI Models Used
📊 9 Scored Dimensions
✍️ Human-Verified, Not Human-Scored

1 The Evaluation Process at a Glance

Our methodology was designed around a single constraint: the scores had to be structurally protected from commercial influence. That meant the people responsible for editorial content could not be the same people producing the scores. The solution was to make the scoring machine-generated and the process transparent enough that any reader could audit whether it had been followed.

The process runs in five sequential steps. Each step is described in detail in the sections below.

1
Platform Selection
Platforms are selected for inclusion based on market relevance, search visibility, and geographic applicability to our target audience. Commercial relationships play no role in selection decisions.
2
Prompt Submission
A standardised evaluation prompt is submitted independently to each of the four AI models. Models receive identical instructions and are not shown each other's outputs at any stage.
3
Score Recording
Dimension scores from each model are recorded exactly as produced. No rounding, normalisation, or adjustment is applied. The Consensus Score is the arithmetic mean of all four model outputs across the nine dimensions.
4
Editorial Verification
Human editors cross-reference the AI-generated product descriptions against publicly available documentation — vendor websites, pricing pages, compliance certifications, and third-party review data — to identify and correct material factual errors. Scores are not part of this review.
5
Publication and Maintenance
Reviews are published with full model score breakdowns visible. The index is re-evaluated on a rolling cycle. When a platform undergoes significant product, pricing, or compliance changes, its evaluation is queued for re-run.

2 The Evaluation Prompt

The prompt below is submitted verbatim to each AI model for every platform evaluation. The only variable that changes between evaluations is the product name in the final line. Nothing else is altered — not the framing, not the dimension order, not the instructions on scoring distribution. This is the complete prompt as used in the March 2026 index cycle.

Why we publish the prompt

Publishing the prompt is not standard practice in the review industry. We do it because it is the most direct way to demonstrate that no vendor receives preferential framing. Any reader can submit this prompt to any of the four models themselves and compare the output to what we have published. Discrepancies are grounds for a legitimate correction request.

Evaluation Prompt — March 2026
Verbatim
You are a senior HR technology research analyst writing an independent review for the AI Consensus Index, a multi-model AI evaluation platform. Your review will be published alongside reviews from other AI models and compared directly. Write with authority, precision, and without marketing language. Be willing to criticize where warranted.

Base the analysis on publicly available information about the product, industry knowledge, and typical ATS capabilities. If specific details are uncertain, state the assumption rather than inventing features.

Review this ATS product: [Product Name]

Use exactly this structure:

OVERVIEW
A 3–4 sentence introduction covering what this ATS is, who makes it, and what market segment it targets.

BEST FOR
One clear sentence. Who is the ideal user or company for this product?

PRICING SUMMARY
Summarize the pricing tiers, contract requirements, and overall value positioning. Note any pricing transparency issues.

STANDOUT FEATURES
3 to 5 features that genuinely differentiate this product from competitors. Be specific, not generic.

SCORED DIMENSIONS
Score each dimension out of 10 with 2–3 sentences of justification per score. Do not round all scores to similar numbers — differentiate clearly based on actual product strengths and weaknesses.

  Ease of Use: X/10
  AI & Automation Features: X/10
  Integrations: X/10
  Pricing & Value: X/10
  Customer Support: X/10
  Scalability: X/10
  Reporting & Analytics: X/10
  Compliance: X/10
  Performance / Time to Hire Impact: X/10

OVERALL SCORE: X/10
The arithmetic mean of the 9 dimension scores above.

PROS
4 to 6 bullet points. Specific and evidence-based, not generic praise.

CONS
3 to 5 bullet points. Be direct. Do not soften legitimate weaknesses.

VERDICT
A 4–5 sentence closing recommendation. State clearly who should buy this, who should avoid it, and whether the product represents good value in the current ATS market. End with one sentence on its outlook for 2026 and beyond.
        

3 The Nine Scored Dimensions

The dimensions were chosen to reflect the decision criteria most relevant to our target buyer: HR Directors, founders, and operational leads at startups and SMBs making a first or second ATS purchase, often without a dedicated procurement function. Each dimension is described below so that readers understand what is and is not being measured.

Dimension 01
Ease of Use
Onboarding time, interface clarity, and realistic learning curve for non-technical HR staff and occasional hiring manager users — not just recruiter power users.
Dimension 02
AI & Automation
Native, production-ready AI capabilities: resume screening, candidate scoring, workflow automation, generative features. Roadmap announcements and beta features do not count toward this score.
Dimension 03
Integrations
Breadth and documented reliability of native integrations with HRIS, background screening, video interviewing, job distribution platforms, and payroll systems.
Dimension 04
Pricing & Value
Transparency of published pricing, total cost of ownership at realistic deployment scale, presence of hidden add-on costs, and overall value relative to the feature set delivered.
Dimension 05
Customer Support
Channel availability, response quality at standard contract tiers (not just enterprise), and access to dedicated success resources without premium uplift requirements.
Dimension 06
Scalability
The platform's documented ability to handle growth in headcount, requisition volume, multi-location structures, and organisational complexity without degradation or migration forcing.
Dimension 07
Reporting & Analytics
Depth of native pipeline analytics, source attribution, custom reporting capability, and access to DEI or workforce intelligence data without requiring a separate BI tool.
Dimension 08
Compliance
GDPR, SOC 2, OFCCP, CCPA, and regional regulatory framework support. Weight is given to automation of compliance workflows rather than the existence of manual tooling.
Dimension 09
Performance / Time to Hire
Documented or credibly attributed impact on hiring cycle duration, coordinator overhead, and candidate drop-off rates. Vendor-supplied statistics without third-party corroboration are treated with scepticism.
On score distribution

The prompt explicitly instructs models not to cluster scores. A platform that is strong across most dimensions but has a material weakness in one area should receive a low score in that dimension — not a softened 7.0. Readers can compare dimension scores to identify a platform's specific strengths and weaknesses, not just its overall position.

4 The Four AI Models

Model selection was governed by two criteria: public availability at the time of evaluation, and demonstrated capability on structured analytical tasks. The four models used in the current index cycle are listed below. Each brings a different weighting toward different types of evidence, which is part of why aggregation across four models produces a more reliable output than any single model alone.

Gemini
Google DeepMind
Tends toward balanced, well-sourced assessments with particular depth on enterprise software market positioning and integration ecosystem breadth.
Grok
xAI
Produces direct, opinionated assessments with a tendency to weight user experience friction and pricing transparency heavily in its scoring.
ChatGPT
OpenAI
Provides structured, methodical scoring with reliable coverage of compliance requirements, integration depth, and enterprise-tier feature comparisons.
Claude
Anthropic
Applies a cautious, evidence-anchored approach. Consistently produces the most conservative scores in the consensus set — scores we have found to be the most reliably calibrated over time.
Why four models and not one

No single AI model has complete or perfectly balanced knowledge of the ATS market. Each reflects the distribution of information available in its training data. Aggregating across four independent models reduces the impact of any single model's blind spots, overconfidence, or training data gaps. Where all four models agree, the score is robust. Where they diverge significantly, that variance is itself informative — it usually reflects genuine ambiguity about a platform's positioning or a recent product change that some models have more exposure to than others.

5 How Scores Are Calculated

The Consensus Score for each platform is the arithmetic mean of the four individual model scores, which are themselves each the arithmetic mean of that model's nine dimension scores. The calculation is straightforward by design — no dimension is weighted above any other, and no model's output is weighted above any other.

9
Dimensions scored per model
4
Models evaluated per platform
36
Individual data points per platform
5–10
Possible score range

Scores are displayed to two decimal places. No rounding to the nearest half-point is applied. A platform scoring 7.87 scores 7.87 — not 7.9 or 8.0. This precision matters when comparing platforms whose consensus scores sit close together in the rankings.

The index is sorted in descending order by Consensus Score. Where two platforms share an identical score to two decimal places, they are listed alphabetically. Rank positions are recalculated after every re-evaluation cycle.

6 The Role of Human Editorial Oversight

Human editorial oversight exists to protect the accuracy of factual claims — not to influence scores. The boundary between what editors can and cannot do is precise and structural.

Editors are responsible for:

Editors cannot:

The non-negotiable rule

If a score is considered incorrect — because of a factual error in the underlying data, a material product change since evaluation, or a documented prompt flaw — the correct response is to re-run the evaluation with an improved prompt and publish both the old and new scores with a change note. Manual adjustment of a published score is not permitted under any circumstance.

7 Re-Evaluation Cadence and Version Control

Scores are not permanent. The ATS market moves quickly — pricing structures change, AI features are added, compliance certifications lapse or are obtained, and acquisitions alter product trajectories. Our policy is to re-evaluate the full index on a rolling cycle and to queue individual platforms for early re-evaluation when a material change is confirmed.

Triggers for an out-of-cycle re-evaluation include:

Each review page carries a "Reviewed" date in the subtitle. This date reflects the most recent evaluation cycle for that platform. The index page carries a separate "Updated" date that reflects the most recent full-index re-run.

8 Known Limitations of This Methodology

We document our methodology's limitations because we believe informed scepticism from readers makes the index more credible, not less.

Our recommendation

Use the Consensus Score as an efficient first filter, not a final decision. Use the dimension scores to identify which platforms align with your specific priorities. Then request demonstrations, conduct your own reference checks, and negotiate commercial terms before committing. No ranking methodology — including ours — substitutes for direct vendor evaluation at the procurement stage.

9 Corrections Policy

We maintain a corrections policy because publishing at this scale produces errors, and how a publication handles errors is as much a trust signal as the errors themselves.

A correction is warranted when a published review contains a demonstrably incorrect statement of fact — a wrong pricing figure, a mischaracterised feature, an inaccurate compliance status — that can be verified against publicly available documentation. Corrections are processed as follows:

Correction requests can be submitted via the contact address in the site footer. We respond to all substantive correction requests within 10 business days and publish a decision either way.

← Back to Rankings