the GiGLer

The newsletter of Greenspace Information of Greater London CIC

Data validation & verification

Ian Woodward, GiGL Royal Parks Officer

The accuracy of the records we hold is essential. Our partners and other GiGL data users often base planning and conservation decisions on our data, and prioritise their work accordingly.They rely on the accuracy of that data to make sure that those decisions are appropriate and effective. With over three quarters of a million records in the database, ensuring the accuracy of every record is a daunting task. So how do we do it?

There are two stages to checking a dataset, reflecting the fact that there are two ways that errors can occur. These two stages are known as data verification and data validation. These two words can cause confusion as their meaning depends on the context – they are sometimes understood to mean the same thing. In biological records, however, there is a very clear difference.

The National Biodiversity Network defines data verification as ‘ensuring the accuracy of the identification of the things being recorded’, data validation as ‘carrying out standardised, often automated checks on the “completeness”, accuracy of transmission and validity of the content of a record’.

Put simply, validation checks all aspects of the records – who made the observation, where it was made and when, except for the species identification – the ‘what’, which is checked during verification. Verification is carried out by species experts, validation by data experts.

Data validation

Data validation is a largely automated process when the data is imported to Recorder. Some validation checks, such as making sure that grid references and locations match or that dates are valid, are built in to the software. In other cases manual entry of data makes it possible to link records to pre-existing locations, avoiding the need to type grid references each time, and removing the possibility of typing errors.

For complicated data sets, further manual validation may occasionally be necessary. Prior to importing a spreadsheet, its layout and contents will be checked. After import, a manual check will be made to ensure that the data points are in the correct place on a Geographical Information System (GIS) map. The validation process can benefit both GiGL and the recorders who make their data available through us. After validation GiGL can return a ‘clean’ data set to the data providers. ‘A side benefit of providing LNHS records to GiGL was that it gave me more confidence in both insights into the working of their validation systems and the integrity of my own data.’ (Rodney Burton, London Natural History Society, one-time vascular plant recorder.)

Data providers should be aware that data validation is not able to identify all potential errors in a dataset. For example, if a date of 15/06/2007 is given incorrectly given as 15/07/2007, this may not be picked up by validation as both dates are valid. Data providers should continue to take care to check their data prior to submitting it to GiGL.

Data verification

Once the data has been validated, data verification makes sure that the identification is correct. With such a diverse range of species and habitat information from such a diverse range of sources, the GiGL team alone are not able to judge the reliability of any individual record. It requires specialist knowledge and many years of experience to be able to judge the likely accuracy of an observation.

Here, GiGL calls upon the skills of experienced recorders to assess records for each species group. In some cases, regional or national panels exist to decide whether a record is acceptable. The panels may request further information from the observer and make a decision based on their knowledge of the species involved. Certain species, such as some invertebrates, are so difficult to identify that a record may only be accepted after a specimen has been collected and checked.

If a record is deemed to be a mis-identification or considered insufficiently proven, it will either be re-attributed to a different species or, it will be marked as unverified and will not be included by GiGL in any reports.

We are aided in this process by the Recorder’s Advisory Group, or ‘RAG’, a panel of experts from various organisations including the Greater London Authority, the London Natural History Society and the Natural History Museum, each of whom specialises in a particular species groups and habitats. Clearly, verifying our many records one by one would take a considerable length of time. Over the last few months, the RAG has been discussing whether GiGL can help reduce the task by using automated processes to identify high risk observations, and so provide fewer records for the experts to review. Records of species that are common and easy to identify, and those from trusted sources where in-house verification has already been carried out, may not need further verification by our experts. Each species group has its own challenges.Verifying records of highly visible species with many observers is very different from reviewing records for more secretive species studied mainly by specialists.

Over the coming months, GiGL will continue to work closely with our expert advisers, to ensure that appropriate measures are in place for each species group and habitat. GiGL’s partners will be able to have even greater confidence that all GiGL’s records have been both thoroughly validated by automated validation processes and verified by external specialists.

Leave a Comment