Our exploration geologists are required to read through hundreds of publicly available documents. They only determine the value of these documents after they have been read, which is time-consuming and inefficient. We want to provide the geologists with more upfront information, allowing them to filter out irrelevant documents. To achieve this, we want to investigate using large/small language models with vision capabilities to generate tags that can be added to the documents and used for filtering.
The solution should be able to determine if a page has text, images and/or tables. It should be able to describe what is contained in the image or table.
Additional Scope
We are also looking at extracting images from documents. Whilst images can be easily extracted from modern documents, older documents are provided as full-page scans, meaning they may contain multiple sub-images. Based on the output of the above solution, we may want to segment the page to extract images.
Department of Computer Science & Software Engineering The University of Western Australia Last modified: 21 July 2024 Modified By: Michael Wise |