Tracking Retractions

Every year, around 300 papers are retracted from the literature for a range of reasons. For some, it is simply an honest admission by the authors that they messed something up. For example, in a recent case, two papers were retracted because the journal published the same paper three times due to an error in their processes. In another case, the authors of a very prominent cancer study were simply not able to replicate the results. On the other hand, papers are often retracted due to malpractice, such as plagiarism and data manipulation.

The aim of the project is to provide data on the different causes of retractions across a range of years. Data will be from NCBI PubMed and, is possible, SCOPUS. The project, to be done using Python, will involve downloading the retraction notices and then doing some Natural Language Processing to extract the reason for the retraction. That could be as simple as using key words, or use of detailed NLP methods. Questions your code will help us with include: "Is the rate of retractions growing year on year (as a percentage of papers published)?", or "As a percentage of the papers published by people from those countries, do any nations stand out?" (It may also be a good idea to count and report the number of papers retracted due to other causes, but to reserve detailed analysis for those papers that are likely to have arisen due to author issues.)

Client

Contact Person: Dr Wei Liu (and Prof Michael Wise)
Telephone: 6488 3095
Email: [email protected]
Preferred method of contact:
Location: CSSE

IP Exploitation Model

The client wishes to use a Creative Commons CC BY-NC model to deal with IP embodied in the project.