What Is Data Fabrication in Research? Definition and Famous Cases
Data fabrication is the most clearly defined form of academic misconduct — and the most consequential when it affects research literature. The AMI's D6 dimension tracks it. Here is what it is, how it is detected, and the cases that defined modern research integrity.
TL;DR
Data fabrication is making up research data; data falsification is altering real data. Both are research misconduct. The AMI's D6 dimension scores them via Retraction Watch. China scores 100 (highest), Russia 78, India 70, Iran 65. Famous cases: Diederik Stapel, Hwang Woo-suk, STAP cells.
TL;DR
Data fabrication is making up research data. Data falsification is altering real data. Both are research misconduct. The AMI's D6 dimension scores fabrication via Retraction Watch data normalised by publication volume. China scores 100 (highest), Russia 78, India 70, Iran 65. Famous cases include Diederik Stapel, Hwang Woo-suk, STAP cells, Marc Hauser, Macchiarini.
Definition
Data fabrication is the creation of research data that was not actually collected or measured. The researcher reports results from experiments that did not occur, observations that were not made, or measurements that were not taken.
Data falsification is altering or selectively reporting real data — manipulating images, omitting inconvenient measurements, or changing values to produce a more favourable result.
Both are research misconduct. The US Office of Research Integrity (ORI) groups them together with plagiarism under the umbrella of research misconduct (FFP: Fabrication, Falsification, Plagiarism).
Why it matters
Data fabrication affects the scientific record. Unlike student plagiarism — which damages credentialing but does not propagate into ongoing research — fabricated data enters the literature, gets cited, and shapes subsequent research:
- Fabricated medical research can mislead clinical practice
- Fabricated psychology research can shape policy decisions
- Fabricated engineering research can affect engineering standards
- Fabricated biology research can lead other researchers to chase non-existent phenomena
The consequences extend far beyond the individual misconduct case.
Detection methods
Statistical analysis
Real data shows expected variance patterns; fabricated data often does not. Forensic statistics has caught multiple major fraud cases by identifying impossible patterns — too-clean distributions, missing variance, statistical impossibilities.
Replication failures
Other researchers attempting to replicate findings discover they cannot reproduce the results. The most direct detection method, though slow and expensive.
Image forensics
Manipulation of images in microscopy, gel electrophoresis, and similar techniques can be detected through pixel-level analysis. Specialised tools (PaperWatcher, Imagetwin) check for duplicated or altered images.
Peer review
Reviewers identifying impossible claims or inconsistencies. Limited in catching fabrication that produces plausible-seeming results.
Post-publication review
PubPeer and similar platforms allow post-publication comment on potential misconduct. Has led to detection of major cases.
Whistleblower reports
Co-authors, lab members, or institutional colleagues reporting suspected misconduct. Many famous cases were initiated by whistleblowers.
What the AMI data shows
D6 scores on a 0–100 scale across the 39-country set:
| Top D6 scores | Score |
|---|---|
| China | 100 |
| Russia | 78 |
| India | 70 |
| Iran | 65 |
| Pakistan | 65 |
| Egypt | 60 |
| Nigeria | 55 |
| South Korea | 55 |
| Lowest D6 scores | Score |
|---|---|
| New Zealand | 12 |
| Ireland | 15 |
| Sweden | 15 |
| Norway | 15 |
| Netherlands | 15 |
| Singapore | 20 |
| Kenya | 20 |
| Vietnam | 22 |
| Canada | 22 |
The D6 dimension is built directly from the Retraction Watch database, filtered to misconduct-linked retractions and normalised by publication volume. China's D6=100 reflects the highest misconduct-linked retraction rate per 10,000 publications in the dataset.
Famous cases
Diederik Stapel (Netherlands, 2011)
Dutch social psychologist who fabricated data in dozens of papers over years. The case led to revocation of his PhD title and broader Dutch reform of social psychology research practice. One of the largest fabrication cases by paper count.
Hwang Woo-suk (South Korea, 2005–2006)
Korean stem cell researcher who claimed to have produced patient-specific stem cell lines through somatic cell nuclear transfer. The results were fabricated; the cloning claims could not be replicated. The case prompted establishment of the Korea Research Integrity (KRI) framework.
STAP cells / Haruko Obokata (Japan, 2014)
Claimed novel stem-cell induction method via stress application. The Nature papers were retracted after replication failures and identification of image manipulation. The case led to JSPS and MEXT integrity reforms.
Marc Hauser (US, 2010)
Harvard primatologist who fabricated data in cognitive psychology research. Resigned from Harvard following ORI investigation.
Paolo Macchiarini (Sweden, 2014–2016)
Karolinska surgeon whose synthetic trachea transplant research was found to involve fabricated patient outcomes and missing ethical approvals. Multiple patients died. The case contributed to the establishment of Sweden's NPOF national misconduct board.
The detection-incidence challenge
Detected cases are not the same as actual incidence. The retraction rate measures what gets caught, not what occurs. Countries with stronger detection infrastructure (peer review, replication culture, post-publication review) report more cases. The AMI applies a detection correction factor but the fundamental challenge remains.
Sources
- Retraction Watch Database, Crossref/GitLab (2026)
- Fang, Steen & Casadevall (2012), PNAS: "Misconduct accounts for the majority of retracted scientific publications"
- ORI (US Office of Research Integrity) case reports
- AMI v1.5 methodology document
Full methodology | Download dataset
Related
Frequently asked questions
What is data fabrication?
Data fabrication is making up research data that was not actually collected or measured. It is distinct from data falsification, which is altering or selectively reporting real data. Both are research misconduct. Data fabrication damages the scientific literature by introducing false results that may be cited and built upon by subsequent researchers.
How is data fabrication detected?
Detection methods include statistical analysis of reported data (real data shows expected variance patterns; fabricated data often does not), replication failures, image forensics (image manipulation in microscopy and gel electrophoresis), peer review, post-publication review on platforms like PubPeer, and whistleblower reports. The Retraction Watch database catalogues confirmed cases.
What are famous cases of data fabrication?
Major cases include: Diederik Stapel (Dutch social psychologist, dozens of fabricated papers, 2011); Hwang Woo-suk (Korean stem cell researcher, fabricated cloning results, 2005–2006); Haruko Obokata / STAP cells (RIKEN, 2014); Marc Hauser (Harvard primatologist, 2010); the Macchiarini case (Karolinska, 2014–2016).
How to cite this article
APA: Booth, F. (2026). What Is Data Fabrication in Research? Definition and Famous Cases. Academic Misconduct Index. https://academicmisconductindex.com/blog/what-is-data-fabrication-research
BibTeX: @misc{booth2026what, author={Booth, Francisco}, title={What Is Data Fabrication in Research? Definition and Famous Cases}, year={2026}, url={https://academicmisconductindex.com/blog/what-is-data-fabrication-research}}
Francisco Booth
Independent researcher, founder of the Academic Misconduct Index
Related posts