Develop a model capable of reliably flagging biomedical articles (appearing on bioRxiv or in biomedical scientific publications) that may be at risk of retraction. Such articles would then be carefully reviewed by peers in the community.
Although the retraction of a scientific article in the biomedical literature is still a rare event, is is getting increasingly frequent [1, 2]. Retractions reflect error, misconduct, and fraud, which can significantly affect the scientific community and undermine the trust that the public puts in science. Detecting articles at risk of retraction could help focus the attention of efforts like Retraction Watch and other post-publication peer review groups. In turn, if the detection of problematic articles becomes more effective, the incentive for fraud is greatly diminished and the penalty for errors is increased, which should improve the overall quality and reliability of the biomedical literature.
Start by using MEDLINE meta-data , with the core clinical journals filter. The meta-data tag “Retracted Publication” can serve as ground-truth label.