The aim of the project is to provide data on the different causes of retractions across a range of years. Data will be from NCBI PubMed and, is possible, SCOPUS. The project, to be done using Python, will involve downloading the retraction notices and then doing some Natural Language Processing to extract the reason for the retraction. That could be as simple as using key words, or use of detailed NLP methods. Questions your code will help us with include: "Is the rate of retractions growing year on year (as a percentage of papers published)?", or "As a percentage of the papers published by people from those countries, do any nations stand out?" (It may also be a good idea to count and report the number of papers retracted due to other causes, but to reserve detailed analysis for those papers that are likely to have arisen due to author issues.)