Tim French CITS3200 Project - Understanding Football

Building a Natural Language Processing Interface for Understanding Football

Natural Langauge Processing has made significant advances in recent times, based largely on the power or deep learning techniques, and neural networks. However, it is still very challenging to understand temporal constructs in language, such as describing a series of events, possible future events, or some event that happened in response to an earlier event. To design systems capable of understanding temporal constructs in language we need to build a large corpus of data for training and testing. A good candidate for such a corpus is sports events. Football (soccer) matches are a series of events in time (goals, substitutions, yellow cards etc), they are very common and have machine readable descriptions readily available (e.g. https://www.api-football.com/), and games have multiple simple plain english descriptions to learn over (e.g. https://www.a-league.com.au/news/live-blog-match-report-perth-glory-v-manchester-united).

This project will develop tools to assist building, cleaning and accessing a corpus to learn how to interpret a match commentary. The training examples will consist of a match overview (in plain English), and a vector describing the import events in the game. To make learning effective player names, teams names, etc should be replaced by generic tags. The tool will allow a user to identify a game with a correpsonding short match report (or possibly automatically identify one). It should then convert the match information into a uniform format for learning, by normalising player information, and allowing irrelevant paragraphs to be removed. The interface should provide some visualisations of games and editting tools to aid the user in building this corpus.

It is not the aim of this project to actually do any natural language processing or deep learning. However the project will provide some insight into the requirements and methods for doing NLP.

Client

Contact: Tim French
Phone: 64882794
Email: [email protected]
Preferred contact: Email
Location: CS 2.14

IP Exploitation Model

The IP exploitation model requested by the Client is: Creative Commons ( open source ) http://creativecommons.org.au/

Department of Computer Science & Software Engineering
The University of Western Australia
Last modified: 22 July 2019
Modified By: Michael Wise