AMI
Guide

AMI Data Quality Flags Explained: A, B, C

The AMI publishes a data quality flag alongside each country's scores. This guide explains what the flags mean, how they are assigned, and how to use them in cross-country comparisons.

TL;DR

Each AMI country carries a data quality flag: A (5+ dimensions from live data), B (3–4 live), C (<3 live). All 39 countries in v1.5 currently carry the A flag. The flag is a measurement-confidence indicator separate from the P or R score values.

data qualitymethodologyAMIguidetransparency

TL;DR

Each country in the AMI dataset carries a data quality flag (A, B, or C) reflecting how much of its score is from live country-specific data versus literature estimates. All 39 countries in v1.5 carry the A flag. The flag is metadata about confidence, separate from the actual P and R values.

The flag system

The data quality flag indicates how many of the six P-Score dimensions are scored from country-specific live data sources:

  • A — 5 or more dimensions from live country-specific data
  • B — 3 or 4 dimensions from live country-specific data
  • C — fewer than 3 dimensions from live country-specific data

The flag is published in the dataset alongside the P-Score, R-Score, and dimension breakdowns.

What counts as live data

The AMI methodology defines live data as sources that:

  1. Provide country-specific signal (not regional extrapolation)
  2. Are based on current observations (not historical priors)
  3. Come from one of the AMI's primary measurement infrastructures

The primary measurement infrastructures are:

  • Google Trends API — for D1 (contract cheating) and D2 (AI submissions) demand signals
  • Retraction Watch database — for D6 (data fabrication)
  • FOI disclosures — particularly for D2 in countries with active disclosure (UK, US, AUS)
  • Country-specific peer-reviewed surveys — including McCabe / ICAI samples where the country was directly surveyed

What does not count as live data

The following do not qualify a dimension as live:

  • Regional extrapolation — applying Latin American or Southeast Asian averages to a country without country-specific data
  • Literature priors — peer-reviewed estimates from older studies (typically pre-2018)
  • Modelled estimates — derived from other dimensions or country variables rather than direct measurement

These methods are used in the AMI where live data is unavailable, but they reduce the dimension's measurement confidence.

Why all v1.5 countries are flag A

The current 39-country set was selected partly with data availability in mind. The principal limit on adding more countries to the index is data quality — countries without live Google Trends coverage, without Retraction Watch presence, or without survey data are harder to score with confidence.

Future versions will expand country coverage, including countries that may carry B or C flags. The flag system allows the AMI to expand coverage while making the measurement-confidence trade-off transparent.

How to use the flag in analysis

When comparing country scores

Two countries with the same P-Score and the same flag can be compared directly. Two countries with the same P-Score but different flags should be compared with the flag context — the lower-flag country's score has wider uncertainty.

When citing AMI data

Cite both the score and the flag where the audience needs measurement-confidence context. "China P=99.98 (flag A)" is more informative than "China P=99.98" alone.

When making policy comparisons

Cross-country policy comparisons should weigh flag context. Flag A countries are appropriate for like-for-like comparison. Flag B and C countries are appropriate for relative ranking but may shift in future versions as data improves.

What drives each dimension's live-data classification

D1 Contract cheating

Live when Google Trends data is available at country resolution with sufficient query volume. Most countries qualify.

D2 AI submissions

Live when Google Trends data is available. Some countries also have FOI or institutional disclosure live data (UK via Guardian, US via various studies).

D3 Exam impersonation

Most countries have this dimension scored from literature estimates rather than live data. The dimension is the most likely to be non-live across countries.

D4 Plagiarism

Live when the country is in the McCabe / ICAI survey sample or has equivalent country-specific surveys. Regional extrapolation otherwise.

D5 Collusion

Same as D4 — live when survey data exists country-specifically.

D6 Data fabrication

Live whenever the Retraction Watch database has sufficient country-attributable entries. Most countries with substantial research output qualify.

Flag and score interact

The data quality flag is conceptually independent from the score, but they relate in practice:

  • Countries with stronger institutional infrastructure tend to have better data availability and thus higher flags
  • Countries with weak institutional infrastructure often also have weak data availability, producing both lower R-Scores and lower flags
  • This means flag and R-Score correlate; they should be reported together for clarity

Sources

  • AMI v1.5 methodology document
  • Data sources documentation in the methodology

Full methodology | Download dataset

Related

Read the full methodology

Frequently asked questions

What does the data quality A flag mean in the AMI?

The A flag indicates that 5 or more of the 6 P-Score dimensions are scored from country-specific live data sources (Google Trends, Retraction Watch, FOI data, country-specific surveys). It is the highest data quality flag and indicates the score should be treated as relatively well-grounded.

Do data quality flags affect the score values?

No — the flag is a separate metadata field. The P and R score values are calculated the same way regardless of flag. The flag indicates how confident users should be in the scores. Two countries with the same P-Score can have different flags, and the flag context should be used when interpreting cross-country comparisons.

What is 'live data' in the AMI methodology?

Live data sources are: Google Trends API (current data), Retraction Watch database (current data), Freedom of Information disclosures from government bodies, and McCabe / ICAI surveys where the specific country is in the survey sample. Literature-derived estimates and regional extrapolation count as non-live.

How to cite this article

APA: Booth, F. (2026). AMI Data Quality Flags Explained: A, B, C. Academic Misconduct Index. https://academicmisconductindex.com/blog/data-quality-flags-explained

BibTeX: @misc{booth2026data, author={Booth, Francisco}, title={AMI Data Quality Flags Explained: A, B, C}, year={2026}, url={https://academicmisconductindex.com/blog/data-quality-flags-explained}}

FB

Francisco Booth

Independent researcher, founder of the Academic Misconduct Index