Fellow Spotlight – Andrew Thornhill 

In this Fellow Spotlight, we spoke with research botanist, Andrew Thornhill

Andrew’s main research focus is on large-scale molecular phylogenetic analyses of plants, across a broad botanical and geographical range. 

During his Fellowship with Subak Australia, Andrew is creating a curated world spatial dataset of flowering plants that will benefit conservation analyses and decisions. The project will indicate whether the occurrence records of all angiosperm (flowering) species in the Global Biodiversity Information Facility (GBIF) are native, naturalised, or neither. 

The additional dataset in the GBIF will serve as a reliable source of botanical information that other researchers can use for their modelling and diversity analyses. Throughout our conversation with Andrew, we talked about the benefits of flora conservation decision-making and how open-sourced data has the potential to accelerate collaborative climate action!

1.Tell us a bit about your background and where your passion for botany came from.

I would say that I have technically been a botanist for almost 16 years. I have known I wanted to be a botanist since I began my undergrad at Monash University in the mid-1990s. The defining point was when I had my knee replaced as an 18-year-old and realised that I probably wouldn't be able to chase animals and so I was better off trying to catch plants instead.

I started growing carnivorous plants as a hobby and that led to a Master's project on the development of digestive glands in pitcher plants. After finishing my Master I moved to Canberra and did a Ph.D. on pollen morphology in the Myrtaceae. Just before I graduated with my Ph.D. I was given a postdoc at CSIRO in Canberra and began working in phylogenetics. It was here that I met the person who would later become my boss at the University of California, Berkeley. 

Our small group began developing methods to combine phylogenies with spatial data and we originated the term called 'Spatial Phylogenetics' to describe it. I moved to the Australian Tropical Herbarium in Cairns after Canberra and then over to America to work at UC Berkeley. I came back to Australia and was given a joint position at the State Herbarium of South Australia and the University of Adelaide. This is where I spent my Subak fellowship working on this project.

2. What is the specific project you are working on as a Fellow at Subak Australia? What datasets are you adding to the GBIF?

My project is curating a world flowering plant spatial dataset using herbarium records. Herbaria are scientific institutes that house dried and pressed plant collections. These specimens act as a record of plants being present at both a location and a time. Botanists have been collecting plant specimens from all over the world for over 200 years and these have been deposited in a herbarium.

In the last 20 years, many herbaria have databased their collections and made them available through online websites such as the Global Biodiversity Information Facility. There are now almost 100 million plant occurrence records contributed to GBIF by herbaria. This includes erroneous records as well as records of plants occurring outside of their original native range. 

For this project, I combined a list of where each flowering plant species is native in the world, a GIS shapefile coded with the same world names, and a spatial dataset of 66 million plant records to curate the plant occurrences to a dataset of only their native range. By running a number of different scripts I was able to do this and it resulted in a final dataset of around 27 million records.

3. What has been the most interesting finding whilst looking into the data of angiosperm species on the GBIF? 

The only significant finding so far is that almost two-thirds of the records that I started with were discarded after running all of the scripts. This means that much of the data in GBIF does not reflect the native range of plants. There are a few possible reasons for this. 

One is that the record might not have a geocode and so was culled when it was matched to the world GIS shape file. The second is that the record represents an occurrence of a plant outside of its native range such as a cultivated plant, a naturalised occurrence, or an erroneous geocode entry. Third, the name of the record did not match the name in the checklist or vice versa and so it was cut from the final dataset. The dataset is going to be combined with other data too but I can't give away the results of that study until it is published.

4. How will the records you are creating enhance the protection and recovery of flora in the future?

The most important thing that the curated records will do is provide non-experts with a dataset that they can use with trust. It will also stop the need for other researchers to download the dataset and clean it themselves. This will save others hours and days’ worth of time. They can then concentrate on the analysis that they want to perform such as assessing diversity in an area or determining if a plant occurs in a location. Modelers can also use this dataset to more reliably map where species may occur both in the present time and into the future.

5. Finally, what benefits do you believe we will start to see from having more open-sourced data readily available, particularly from your experience as an academic?

The benefit should be more reliable results. Having datasets that you know are correct means that you can have more trust in the results that you get from them. It will save many researchers time because they won't have to clean the data. Datasets such as the angiosperm one will also provide a basis from which we can continue to improve.

Cleaned dataset of world angiosperms (flowering plants)

Completed by Andrew during his fellowship.

Previous
Previous

Calling climate data experts: the Subak Climate Data Research Initiative

Next
Next

Inaugural Cohort 1 Graduate the Subak Australia Accelerator