Identifying biomedical articles at risk for retraction

Develop a model to analyze the content of new biomedical articles to determine the likelihood of fraud or scientific error.

Start date: October 2016
Category: Applied Research
Contact point: Andrew Goldstein -

Problem description

Develop a model capable of reliably flagging biomedical articles (appearing on bioRxiv or in biomedical scientific publications) that may be at risk of retraction. Such articles would then be carefully reviewed by peers in the community.

Why this problem matters

Although the retraction of a scientific article in the biomedical literature is still a rare event, is is getting increasingly frequent [1, 2]. Retractions reflect error, misconduct, and fraud, which can significantly affect the scientific community and undermine the trust that the public puts in science. Detecting articles at risk of retraction could help focus the attention of efforts like Retraction Watch and other post-publication peer review groups. In turn, if the detection of problematic articles becomes more effective, the incentive for fraud is greatly diminished and the penalty for errors is increased, which should improve the overall quality and reliability of the biomedical literature.


Start by using MEDLINE meta-data [3], with the core clinical journals filter. The meta-data tag “Retracted Publication” can serve as ground-truth label.