Retraction Watch on GitLab: The 2023 Crossref Partnership
For most of its history, the Retraction Watch database was a paid resource. The 2023 Crossref partnership changed that — making 69,911 records publicly available on GitLab. Here is what changed and why it matters.
TL;DR
In 2023, Crossref entered a partnership with the Center for Scientific Integrity to make the Retraction Watch database openly available on GitLab. The 69,911-record database (April 2026) is now public. The partnership was a precondition for the AMI's D6 dimension being measurable from live data.
TL;DR
In 2023, Crossref entered a partnership with the Center for Scientific Integrity to make the Retraction Watch database openly available on GitLab. 69,911 records as of April 2026, with 5,390 misconduct-linked. The partnership was a precondition for the AMI's D6 dimension being measurable from live data.
What changed in 2023
Before the partnership
The Retraction Watch database was operated by the Center for Scientific Integrity (the non-profit behind the Retraction Watch blog). The database was:
- Accessible to academic institutions through subscription
- Not freely available to the public or to independent researchers
- A paid resource that limited broader analytical use
After the partnership
Crossref — the not-for-profit organisation that manages DOIs and bibliographic metadata for the scholarly publishing industry — entered a partnership to host the database openly. The data is now:
- Hosted on GitLab (open access)
- Updated periodically
- Freely available for any research or analytical use
- Integrated with Crossref's broader scholarly metadata infrastructure
Why this partnership matters
Accessibility transformation
The single biggest change: any researcher, journalist, or policy analyst can now access the full dataset. Previous coverage was institutional and partial; current coverage is universal and comprehensive.
Integration with metadata infrastructure
Crossref provides the underlying DOI infrastructure for most scholarly publications. Integrating Retraction Watch with Crossref creates linking between:
- Original publication records
- Retraction notices
- Author affiliations and credentials
- Citation networks
The integration enables analysis that was previously difficult.
Research enablement
Several systematic analyses became feasible after 2023:
- Cross-country retraction rate calculations (the AMI's D6 approach)
- Paper mill cluster identification
- Co-authorship network analysis on retracted papers
- Time-series analysis of retraction reasons
Why the AMI needed this
The D6 dimension requires comprehensive coverage
The AMI's D6 (data fabrication) dimension is built from Retraction Watch data:
- Filter to misconduct-linked retractions
- Country-attribute via author affiliations
- Normalise by publication volume from OpenAlex
- Calculate per-publication rates
- Rescale across the 39-country set
Each step requires comprehensive coverage. Subscription-only access meant the AMI methodology could not be operationalised with current data.
Open data enables open methodology
The AMI is published under CC BY 4.0. Open methodology requires open inputs. The Crossref partnership made it possible to build D6 from a fully open data source — supporting the AMI's licensing and reproducibility commitments.
Verification by third parties
Any third party can verify the AMI's D6 calculations by:
- Downloading the Retraction Watch data from GitLab
- Applying the AMI methodology's filter and normalisation steps
- Computing comparable results
Subscription-only data would have prevented this verification.
Growth pattern of the database
| Year | Approx. records |
|---|---|
| 2010 | ~5,000 |
| 2015 | ~10,000 |
| 2020 | ~30,000 |
| 2022 | ~45,000 |
| 2024 | ~65,000 |
| April 2026 | 69,911 |
Pre-2020 growth
Gradual accumulation as Retraction Watch covered ongoing retractions and back-catalogued historical cases. Annual growth in the low thousands.
2020–2022 acceleration
Two factors:
- Several major publishers (PLOS, Wiley, Hindawi) ran systematic retraction campaigns identifying paper mill content. Each campaign added thousands of records.
- Image forensics tools (Imagetwin, PaperWatcher) drove a wave of image manipulation detection.
2023 partnership and ongoing growth
The Crossref partnership accelerated growth in two ways:
- Improved record completeness (previously fragmented coverage filled)
- Better integration of recent retractions through Crossref's DOI infrastructure
Database structure
What each record contains
- Article identifiers (DOI, PubMed ID, etc.)
- Authors and country attribution
- Journal and publisher
- Original publication date
- Retraction date
- Retraction reason (multiple categories)
- Notice text where available
Reason categorisation
The AMI methodology filters to misconduct-linked retractions:
- Fabrication
- Falsification
- Image manipulation
- Plagiarism (in research context)
- Fraud
- Manipulation of peer review
Excluded: honest errors, duplicate publication, ethics issues, author requests, other non-misconduct categories.
The misconduct-linked subset is approximately 5,390 of 69,911 records (~7.7%).
What other instruments depend on the partnership
The AMI's D6 dimension
The most directly dependent. Without the partnership, D6 measurement would not have been feasible from current data.
Academic research on retraction patterns
Numerous research projects rely on the Retraction Watch data:
- Fang, Steen & Casadevall (2012, PNAS) — established the misconduct-share of retractions
- Liang et al. (2024) — AI content in Chinese academic papers
- Various country and discipline-specific analyses
Policy analysis
Government and intergovernmental bodies (OECD, UNESCO, individual national research councils) increasingly cite Retraction Watch data in research integrity policy documents.
Journalism
Investigative journalism on research misconduct depends on the database. Major recent investigations (the Wansink case, the Hwang Woo-suk reanalysis, paper mill exposés) used Retraction Watch as the principal source.
Operational implications
For Retraction Watch
The partnership shifted Retraction Watch's funding model. Previously subscription revenue supported operations; now grant and foundation funding plays a larger role. The Center for Scientific Integrity continues to operate the database with Crossref as hosting partner.
For Crossref
The partnership expanded Crossref's scope beyond DOI management and standard metadata. The integration with research integrity data is a meaningful expansion of Crossref's mission.
For institutional subscribers
Pre-existing subscribers retained access plus benefited from the broader integration. The partnership did not strand earlier institutional investment.
What this enables next
Real-time integration
The Crossref infrastructure enables retraction notices to flow more rapidly through to publication databases, citation tools, and research analytics platforms.
Cross-database joins
Joining Retraction Watch with other Crossref-managed datasets (publication corpora, ORCID profiles, funding records) enables analyses that were previously much harder.
Future AMI versions
The partnership ensures AMI D6 measurement can continue. Future AMI versions (v2.0 with expanded country coverage, planned methodological improvements) will continue to rely on the open Retraction Watch data.
Sources
- Crossref / Retraction Watch partnership announcement (2023) [verify exact URL]
- Retraction Watch Database on GitLab
- Retraction Watch blog and Center for Scientific Integrity
- AMI v1.5 methodology document
Frequently asked questions
When did Retraction Watch become publicly available?
In 2023, Crossref entered a partnership with the Center for Scientific Integrity to make the Retraction Watch database openly available. The data is hosted on GitLab and updated periodically. Before 2023, the database was accessible to academic institutions through subscription but not freely available to the public.
Why is the Retraction Watch / Crossref partnership important?
The partnership transformed accessibility of research misconduct data. Before 2023, systematic analysis of retraction patterns required institutional access. After the partnership, any researcher, journalist, or policy analyst can access the full dataset. The Academic Misconduct Index's D6 dimension would not have been measurable from current data without this partnership.
How big is the Retraction Watch database now?
As of April 2026, the database contains 69,911 retraction records, with 5,390 classified as misconduct-related (fabrication, falsification, image manipulation, fraud). The database has grown from approximately 5,000 records in 2010 to its current size — partly through ongoing detection, partly through historical record cataloguing, and partly through systematic paper mill detection efforts.
How to cite this article
APA: Booth, F. (2026). Retraction Watch on GitLab: The 2023 Crossref Partnership. Academic Misconduct Index. https://academicmisconductindex.com/blog/retraction-watch-gitlab-move
BibTeX: @misc{booth2026retraction, author={Booth, Francisco}, title={Retraction Watch on GitLab: The 2023 Crossref Partnership}, year={2026}, url={https://academicmisconductindex.com/blog/retraction-watch-gitlab-move}}
Francisco Booth
Independent researcher, founder of the Academic Misconduct Index
Related posts