AMI
News

Retraction Watch on GitLab: The 2023 Crossref Partnership

For most of its history, the Retraction Watch database was a paid resource. The 2023 Crossref partnership changed that — making 69,911 records publicly available on GitLab. Here is what changed and why it matters.

TL;DR

In 2023, Crossref entered a partnership with the Center for Scientific Integrity to make the Retraction Watch database openly available on GitLab. The 69,911-record database (April 2026) is now public. The partnership was a precondition for the AMI's D6 dimension being measurable from live data.

Retraction WatchCrossrefGitLabopen datanews

TL;DR

In 2023, Crossref entered a partnership with the Center for Scientific Integrity to make the Retraction Watch database openly available on GitLab. 69,911 records as of April 2026, with 5,390 misconduct-linked. The partnership was a precondition for the AMI's D6 dimension being measurable from live data.

What changed in 2023

Before the partnership

The Retraction Watch database was operated by the Center for Scientific Integrity (the non-profit behind the Retraction Watch blog). The database was:

  • Accessible to academic institutions through subscription
  • Not freely available to the public or to independent researchers
  • A paid resource that limited broader analytical use

After the partnership

Crossref — the not-for-profit organisation that manages DOIs and bibliographic metadata for the scholarly publishing industry — entered a partnership to host the database openly. The data is now:

  • Hosted on GitLab (open access)
  • Updated periodically
  • Freely available for any research or analytical use
  • Integrated with Crossref's broader scholarly metadata infrastructure

Why this partnership matters

Accessibility transformation

The single biggest change: any researcher, journalist, or policy analyst can now access the full dataset. Previous coverage was institutional and partial; current coverage is universal and comprehensive.

Integration with metadata infrastructure

Crossref provides the underlying DOI infrastructure for most scholarly publications. Integrating Retraction Watch with Crossref creates linking between:

  • Original publication records
  • Retraction notices
  • Author affiliations and credentials
  • Citation networks

The integration enables analysis that was previously difficult.

Research enablement

Several systematic analyses became feasible after 2023:

  • Cross-country retraction rate calculations (the AMI's D6 approach)
  • Paper mill cluster identification
  • Co-authorship network analysis on retracted papers
  • Time-series analysis of retraction reasons

Why the AMI needed this

The D6 dimension requires comprehensive coverage

The AMI's D6 (data fabrication) dimension is built from Retraction Watch data:

  1. Filter to misconduct-linked retractions
  2. Country-attribute via author affiliations
  3. Normalise by publication volume from OpenAlex
  4. Calculate per-publication rates
  5. Rescale across the 39-country set

Each step requires comprehensive coverage. Subscription-only access meant the AMI methodology could not be operationalised with current data.

Open data enables open methodology

The AMI is published under CC BY 4.0. Open methodology requires open inputs. The Crossref partnership made it possible to build D6 from a fully open data source — supporting the AMI's licensing and reproducibility commitments.

Verification by third parties

Any third party can verify the AMI's D6 calculations by:

  1. Downloading the Retraction Watch data from GitLab
  2. Applying the AMI methodology's filter and normalisation steps
  3. Computing comparable results

Subscription-only data would have prevented this verification.

Growth pattern of the database

YearApprox. records
2010~5,000
2015~10,000
2020~30,000
2022~45,000
2024~65,000
April 202669,911

Pre-2020 growth

Gradual accumulation as Retraction Watch covered ongoing retractions and back-catalogued historical cases. Annual growth in the low thousands.

2020–2022 acceleration

Two factors:

  1. Several major publishers (PLOS, Wiley, Hindawi) ran systematic retraction campaigns identifying paper mill content. Each campaign added thousands of records.
  2. Image forensics tools (Imagetwin, PaperWatcher) drove a wave of image manipulation detection.

2023 partnership and ongoing growth

The Crossref partnership accelerated growth in two ways:

  1. Improved record completeness (previously fragmented coverage filled)
  2. Better integration of recent retractions through Crossref's DOI infrastructure

Database structure

What each record contains

  • Article identifiers (DOI, PubMed ID, etc.)
  • Authors and country attribution
  • Journal and publisher
  • Original publication date
  • Retraction date
  • Retraction reason (multiple categories)
  • Notice text where available

Reason categorisation

The AMI methodology filters to misconduct-linked retractions:

  • Fabrication
  • Falsification
  • Image manipulation
  • Plagiarism (in research context)
  • Fraud
  • Manipulation of peer review

Excluded: honest errors, duplicate publication, ethics issues, author requests, other non-misconduct categories.

The misconduct-linked subset is approximately 5,390 of 69,911 records (~7.7%).

What other instruments depend on the partnership

The AMI's D6 dimension

The most directly dependent. Without the partnership, D6 measurement would not have been feasible from current data.

Academic research on retraction patterns

Numerous research projects rely on the Retraction Watch data:

  • Fang, Steen & Casadevall (2012, PNAS) — established the misconduct-share of retractions
  • Liang et al. (2024) — AI content in Chinese academic papers
  • Various country and discipline-specific analyses

Policy analysis

Government and intergovernmental bodies (OECD, UNESCO, individual national research councils) increasingly cite Retraction Watch data in research integrity policy documents.

Journalism

Investigative journalism on research misconduct depends on the database. Major recent investigations (the Wansink case, the Hwang Woo-suk reanalysis, paper mill exposés) used Retraction Watch as the principal source.

Operational implications

For Retraction Watch

The partnership shifted Retraction Watch's funding model. Previously subscription revenue supported operations; now grant and foundation funding plays a larger role. The Center for Scientific Integrity continues to operate the database with Crossref as hosting partner.

For Crossref

The partnership expanded Crossref's scope beyond DOI management and standard metadata. The integration with research integrity data is a meaningful expansion of Crossref's mission.

For institutional subscribers

Pre-existing subscribers retained access plus benefited from the broader integration. The partnership did not strand earlier institutional investment.

What this enables next

Real-time integration

The Crossref infrastructure enables retraction notices to flow more rapidly through to publication databases, citation tools, and research analytics platforms.

Cross-database joins

Joining Retraction Watch with other Crossref-managed datasets (publication corpora, ORCID profiles, funding records) enables analyses that were previously much harder.

Future AMI versions

The partnership ensures AMI D6 measurement can continue. Future AMI versions (v2.0 with expanded country coverage, planned methodological improvements) will continue to rely on the open Retraction Watch data.

Sources

Full methodology | Download dataset

Frequently asked questions

When did Retraction Watch become publicly available?

In 2023, Crossref entered a partnership with the Center for Scientific Integrity to make the Retraction Watch database openly available. The data is hosted on GitLab and updated periodically. Before 2023, the database was accessible to academic institutions through subscription but not freely available to the public.

Why is the Retraction Watch / Crossref partnership important?

The partnership transformed accessibility of research misconduct data. Before 2023, systematic analysis of retraction patterns required institutional access. After the partnership, any researcher, journalist, or policy analyst can access the full dataset. The Academic Misconduct Index's D6 dimension would not have been measurable from current data without this partnership.

How big is the Retraction Watch database now?

As of April 2026, the database contains 69,911 retraction records, with 5,390 classified as misconduct-related (fabrication, falsification, image manipulation, fraud). The database has grown from approximately 5,000 records in 2010 to its current size — partly through ongoing detection, partly through historical record cataloguing, and partly through systematic paper mill detection efforts.

How to cite this article

APA: Booth, F. (2026). Retraction Watch on GitLab: The 2023 Crossref Partnership. Academic Misconduct Index. https://academicmisconductindex.com/blog/retraction-watch-gitlab-move

BibTeX: @misc{booth2026retraction, author={Booth, Francisco}, title={Retraction Watch on GitLab: The 2023 Crossref Partnership}, year={2026}, url={https://academicmisconductindex.com/blog/retraction-watch-gitlab-move}}

FB

Francisco Booth

Independent researcher, founder of the Academic Misconduct Index