Recently I caught sight of a Tweet where there was a request for someone with experience in artificial intelligence / machine learning.
There is was a shout out to see if anyone in the tech world had ideas using AI or ML that could help contribute to climate science and understanding extreme weather.
Cognitive Services Form Recognizer API
Form Recognizer API came to mind. This is a newer API that lives in the Cognitive Services ecosystem. With just 5 sample documents, Form Recognizer can ingest form data from JPG, PDF or PNG files and then output structured data.
Another bonus is the structured data returned by Form Recognizer includes the original relationships that were present in the uploaded file.
(image source: Microsoft)
High Level Plan
I’ve been fortunate enough to have been granted access to a set of PDFs which include thousands of data tables from 1873-1921 which contain a valuable source of undigitized weather data. A rough plan could consist of the following:
- Identify sample dataset
- Provision Form Recognizer in Azure
- Create custom code to convert sample dataset to byte arrays
- Push binary to form recognizer
- Process the response!
I had some other thoughts…
- data could be pushed into an Azure DB
- create a REST API as an interface into the data (like https://data.gov.uk/ initiatives)
Going to see what’s possible with the API for this use case in the coming weeks.