Alan Turing Institute and WWF Conservation Intelligence – Data Study Group Dec 2019
The CI team were lucky enough to be a ‘challenge owner’ at the Alan Turing Institute (ATI), Data Study Group (DSG) event in December. WWF and five other challenge owners spent an intensive week with AI programmers exploring methods to revolve complex real-world challenges.
Our challenge was looking at whether we could identify within an unstructured news API relevant news stories which define emerging threats, such as new infrastructure development in areas of conservation interest, e.g. protected areas, and the actors/stakeholders involved. The idea being to provide WWF and conservation sector with an open and improved data source on emerging threats.
The DSG team were truly exceptional, building out the following structure;
First pre-processing the text, cleaning the data whilst also building up the keyword library via vectors. The team explored various options for sentiment analysis. Using Latent Dirichlet Allocation for topic modelling and geoparser to identify and compare locations. The results starting with a baseline logistic regression classifier achieved 60% recall and precision, which led to several deep learning approaches. Which led to a fine-tuned version of a BERT language model, achieving an impressive 96% recall and 82% precision. Impressive considering the rarity of relevant articles and the small size of the initial training dataset provided to the team.
A full report will be made available shortly.
Special thanks to;
Principal Investigators: Kasra Hussein and Mariona Coll Ardanui
Project Facilitator; Dan S. Nielsen
Researchers; Amy Krause, Alina Miron, Barbara Metzler, Anton Baleato Lizancos, Reka Vonnak, Alejandro Coca-Castro, Victoria Auyeung, Lesley Dwyer, Selina Cho, Lucas Deecke, Leonardo Castro-Gonzalez and Katriona Goldmann