Guide20 May 2026

AMI Data Quality Flags Explained: A, B, C

Q: What does the data quality A flag mean in the AMI?

The A flag indicates that 5 or more of the 6 P-Score dimensions are scored from country-specific live data sources (Google Trends, Retraction Watch, FOI data, country-specific surveys). It is the highest data quality flag and indicates the score should be treated as relatively well-grounded.

Q: Do data quality flags affect the score values?

No — the flag is a separate metadata field. The P and R score values are calculated the same way regardless of flag. The flag indicates how confident users should be in the scores. Two countries with the same P-Score can have different flags, and the flag context should be used when interpreting cross-country comparisons.

Q: What is 'live data' in the AMI methodology?

Live data sources are: Google Trends API (current data), Retraction Watch database (current data), Freedom of Information disclosures from government bodies, and McCabe / ICAI surveys where the specific country is in the survey sample. Literature-derived estimates and regional extrapolation count as non-live.

The AMI publishes a data quality flag alongside each country's scores. This guide explains what the flags mean, how they are assigned, and how to use them in cross-country comparisons.

TL;DR

Each AMI country carries a data quality flag: A (5+ dimensions from live data), B (3–4 live), C (<3 live). All 39 countries in v1.5 currently carry the A flag. The flag is a measurement-confidence indicator separate from the P or R score values.

data qualitymethodologyAMIguidetransparency

TL;DR

Each country in the AMI dataset carries a data quality flag (A, B, or C) reflecting how much of its score is from live country-specific data versus literature estimates. All 39 countries in v1.5 carry the A flag. The flag is metadata about confidence, separate from the actual P and R values.

The flag system

The data quality flag indicates how many of the six P-Score dimensions are scored from country-specific live data sources:

A — 5 or more dimensions from live country-specific data
B — 3 or 4 dimensions from live country-specific data
C — fewer than 3 dimensions from live country-specific data

The flag is published in the dataset alongside the P-Score, R-Score, and dimension breakdowns.

What counts as live data

The AMI methodology defines live data as sources that:

Provide country-specific signal (not regional extrapolation)
Are based on current observations (not historical priors)
Come from one of the AMI's primary measurement infrastructures

The primary measurement infrastructures are:

Google Trends API — for D1 (contract cheating) and D2 (AI submissions) demand signals
Retraction Watch database — for D6 (data fabrication)
FOI disclosures — particularly for D2 in countries with active disclosure (UK, US, AUS)
Country-specific peer-reviewed surveys — including McCabe / ICAI samples where the country was directly surveyed

What does not count as live data

The following do not qualify a dimension as live:

Regional extrapolation — applying Latin American or Southeast Asian averages to a country without country-specific data
Literature priors — peer-reviewed estimates from older studies (typically pre-2018)
Modelled estimates — derived from other dimensions or country variables rather than direct measurement

These methods are used in the AMI where live data is unavailable, but they reduce the dimension's measurement confidence.

Why all v1.5 countries are flag A

The current 39-country set was selected partly with data availability in mind. The principal limit on adding more countries to the index is data quality — countries without live Google Trends coverage, without Retraction Watch presence, or without survey data are harder to score with confidence.

Future versions will expand country coverage, including countries that may carry B or C flags. The flag system allows the AMI to expand coverage while making the measurement-confidence trade-off transparent.

How to use the flag in analysis

When comparing country scores

Two countries with the same P-Score and the same flag can be compared directly. Two countries with the same P-Score but different flags should be compared with the flag context — the lower-flag country's score has wider uncertainty.

When citing AMI data

Cite both the score and the flag where the audience needs measurement-confidence context. "China P=99.98 (flag A)" is more informative than "China P=99.98" alone.

When making policy comparisons

Cross-country policy comparisons should weigh flag context. Flag A countries are appropriate for like-for-like comparison. Flag B and C countries are appropriate for relative ranking but may shift in future versions as data improves.

What drives each dimension's live-data classification

D1 Contract cheating

Live when Google Trends data is available at country resolution with sufficient query volume. Most countries qualify.

D2 AI submissions

Live when Google Trends data is available. Some countries also have FOI or institutional disclosure live data (UK via Guardian, US via various studies).

D3 Exam impersonation

Most countries have this dimension scored from literature estimates rather than live data. The dimension is the most likely to be non-live across countries.

D4 Plagiarism

Live when the country is in the McCabe / ICAI survey sample or has equivalent country-specific surveys. Regional extrapolation otherwise.

D5 Collusion

Same as D4 — live when survey data exists country-specifically.

D6 Data fabrication

Live whenever the Retraction Watch database has sufficient country-attributable entries. Most countries with substantial research output qualify.

Flag and score interact

The data quality flag is conceptually independent from the score, but they relate in practice:

Countries with stronger institutional infrastructure tend to have better data availability and thus higher flags
Countries with weak institutional infrastructure often also have weak data availability, producing both lower R-Scores and lower flags
This means flag and R-Score correlate; they should be reported together for clarity

Sources

AMI v1.5 methodology document
Data sources documentation in the methodology

Full methodology | Download dataset

Read the full methodology

Frequently asked questions

What does the data quality A flag mean in the AMI?

The A flag indicates that 5 or more of the 6 P-Score dimensions are scored from country-specific live data sources (Google Trends, Retraction Watch, FOI data, country-specific surveys). It is the highest data quality flag and indicates the score should be treated as relatively well-grounded.

Do data quality flags affect the score values?

No — the flag is a separate metadata field. The P and R score values are calculated the same way regardless of flag. The flag indicates how confident users should be in the scores. Two countries with the same P-Score can have different flags, and the flag context should be used when interpreting cross-country comparisons.

What is 'live data' in the AMI methodology?

Live data sources are: Google Trends API (current data), Retraction Watch database (current data), Freedom of Information disclosures from government bodies, and McCabe / ICAI surveys where the specific country is in the survey sample. Literature-derived estimates and regional extrapolation count as non-live.

How to cite this article

APA: Booth, F. (2026). AMI Data Quality Flags Explained: A, B, C. Academic Misconduct Index. https://academicmisconductindex.com/blog/data-quality-flags-explained

BibTeX: @misc{booth2026data, author={Booth, Francisco}, title={AMI Data Quality Flags Explained: A, B, C}, year={2026}, url={https://academicmisconductindex.com/blog/data-quality-flags-explained}}

Francisco Booth

Independent researcher, founder of the Academic Misconduct Index

Guide

What Is Contract Cheating? Definition, Examples, and Global Data

Guide

What Is an Essay Mill? How They Work and Which Countries They Target

News

Introducing the Academic Misconduct Index

← Back to all posts

AMI Data Quality Flags Explained: A, B, C

TL;DR

The flag system

What counts as live data

What does not count as live data

Why all v1.5 countries are flag A

How to use the flag in analysis

When comparing country scores

When citing AMI data

When making policy comparisons

What drives each dimension's live-data classification

D1 Contract cheating

D2 AI submissions

D3 Exam impersonation

D4 Plagiarism

D5 Collusion

D6 Data fabrication

Flag and score interact

Sources

Related

Frequently asked questions

What does the data quality A flag mean in the AMI?

Do data quality flags affect the score values?

What is 'live data' in the AMI methodology?