Open Data

The open data movement is inspiring and exciting, but for it to work for our sector it needs long-term and significant investment in our local, regional and national data infrastructure.

The Open Data Institute (ODI) defines open data as “data that anyone can access, use or share”. The Data Spectrum devised by the ODI, illustrated below, lays out the language of data to help us understand different levels of access and licencing.

GiGL operates under a shared data business model, which means our detailed and up to date data are publicly accessible via our free and charged services, but end uses are restricted by licensing terms to protect data flow and our business. We also publish open data to act as a shop window for our services. For more information on GiGL’s stance on open data, please see our GiGLer article here.

You can view GiGL’s full open data policy here.

GiGL’s Open Data

GiGL publishes datasets using open data licences to improve consumers’ understanding of London’s natural environment and raise the profile of our services. The open datasets are blurred spatially, restricted temporally, or are less detailed than the high quality datasets that underpin our services. We do not publish open data without express permission from the data owners, and notify all partners of open data publication as part of our ongoing services.

At GiGL, the licences we use for our open datasets are the OGL (Open Government Licence) and CC-BY (Creative Commons Attribution licence). The only restriction placed on consumers by these licences is that they must give attribution to GiGL as the data source, which means consumers can republish and make money from the corresponding data.

A full catalogue of GiGL’s current open data can be found in the table below, along with some alternative sources of related open datasets and their potential issues. GiGL’s complete, current and most accurate versions of these datasets are available under licence via a Service Level Agreement (SLA).

Dataset GiGL Open Data Alternative Sources of Open Data and their Potential Issues
Sites of Importance for Nature Conservation (SINCs) Via the London Datastore. This is updated on an annual basis with a version from the previous year, with reduced attributes
Public Open Spaces (POS) Due to release open version by the end of 2023-24 financial year Some available via local authorities but may not be as comprehensive as GiGL
Areas of Deficiency in Access to Nature (SINC AoD) and Areas of Deficiency in Access to Public Open Space (POS AoD) Due to release some statistics regarding AoD per borough or ward 
800m AoD Via the GLA website at polygon level
Privately Owned Public Spaces (POPS) Via the London Datastore
Open Space Spaces to Visit and Friends Group subsets via the London Datastore. Both are updated on a quarterly basis Partial data available from local authorities but may have poor coverage
Habitats Natural England Priority Habitats layer. Only holds priority habitats and may be out of date or incomplete in many areas
Species Species records will be added to the NBN Atlas (with permission) at 10km/2km resolution to act as a signpost for better resolution data held by GiGL See below for full list of 3rd party open species data. Some issues include: use restrictions; lower resolution; lack of verification/validation by GiGL verifiers; differing formats make it difficult to compile if from more than 1 source; potential duplications
Trees Some data are currently available on the NBN Atlas. This dataset has not been updated or added to since it was uploaded in 2018. It is an amalgamation of GLA habitat survey data and LB street tree data
Green Belt and Metropolitan Open Land (MOL) Via the London Datastore from the GLA. GiGL update and manage Greater London’s Green Belt and MOL data so these alternative datasets may be out of date and incomplete
Statutory Sites (SSSI, SAC, SPA, Ramsar, NNR, LNR) LNRs are available via Natural England. We review this dataset and add missing sites. The revised data are made available by GiGL services
Datasets stewarded by GiGL and how to access them, as well as alternative sources and their potential limitations

Evaluating External Open Datasets and Platforms

Data Management Association UK (DAMA) devised 60 criteria for assessing data quality. GiGL have selected the 10 most relevant of these to evaluate a selection of non-GiGL open datasets. The criteria are as follows:

Criteria Definition Example questions for assessment
Completeness
The degree to which all attributes, records, data values and metadata are all present and fully described
Does the metadata describe data origins, attributes, limitations etc.?
Does the dataset have full coverage for the region it should?
Consistency The degree to which data values of two sets of attributes within the data comply to a rule Are there differences in how one data value is recorded over another?
Licensing The degree to which appropriate and reliable licencing is defined What are the restrictions for use and publication?
Longevity The degree to which the dataset has been kept up to date and its update frequency Is the dataset current and how frequently will it be refreshed?
Precision The degree of accuracy with which data values are recorded or classified How precise is the data relative to the real-world entities being recorded?
Reputation The degree to which data are trusted or highly regarded in terms of their source or content Is the data from a trusted, reliable, source and has it been validated and verified by experts?
Timeliness The degree to which the period between the time of creation of the real value and the time that the dataset is available is appropriate How current is the data in relation to reality and its likely changeability?
Uniqueness The degree to which records occur only once within a dataset Are there multiple sources of the same data values?
Usability The degree to which data can be accessed and understood by data consumers How easy is it to access the data?
Can it be easily understood and interpreted?
Validity The degree to which data values comply with rules Do the data values conform to their definition?

Open Habitat and Site datasets

The table below compares two open datasets – Living England Habitats and Ancient Woodland Inventory.

Open habitat and site datasets

Living England Habitats Ancient Woodland Inventory
Data description A habitat probability map for the whole of England, created using satellite imagery, field data records and other geospatial data in a machine learning framework. The map shows the extent and distribution of broad habitats across England, providing a valuable insight into our natural capital assets and helping to inform land management decisions. A spatial dataset that describes the geographic extent and location of ancient woodland (AW) habitat in England (excluding the Isles of Scilly). AW is land that has had a continuous woodland cover since at least 1600 AD.  It includes Ancient Semi-Natural Woodland (ASNW), which retains a native tree and shrub cover, Plantation on Ancient Woodland Sites (PAWS) where the original tree cover has been felled and replaced by planting, or Ancient Wood Pasture (AWP) where the trees are managed in tandem with a long established tradition of grazing. AW is identified using old maps and information including names, shapes, internal boundaries, locations relative to other features, ground surveys, and aerial photography. The Inventory Database records grid references, areas (ha) and the proportion that is semi-natural or replanted.
Data format(s) Online map, MAGIC, download of SHP, TAB, WMS and WFS Online map, MAGIC, download of SHP, TAB and WFS
Currently part of GIGL data? No No

Evaluation Criteria

Living England Habitats Ancient Woodland Inventory
Completeness Contiguous across England at broad habitat level.
Metadata complete 
Known omissions and inaccuracies for Greater London. Only includes sites 2ha and above. Smaller fragmented sites are still crucial habitats and especially important in urban environments such as London. Metadata complete
Consistency High; modelled across England using nationally available data There is a comprehensive methodology with internal experts in NE involved in verifying amendments. Consistency will improve with the update project. To some degree aspects of decision-making will always remain subjective due to the difficulty in interpreting historic maps and the scarcity of evidence for some sites
Licensing Open Government Licence Open Government Licence
Longevity An ongoing project, which hopes to update the map every two years Current AWI Update project happening nationally with an end date of March 2024
Precision Spatial resolution of 20-50 metre squares Spatial resolution of 10-100 metres
Reputation From a trusted source From a trusted source
Timeliness Mixed; age of data sources and rate of change in habitat type varies so some will be more up-to-date than others Mixed; ancient woodland are classed as an irreplaceable habitat due to the length of time it takes for them and the communities that rely on them to form. That being said, woodlands can be felled or fail to reach ASNW status due to degradation
Uniqueness Mixed; combination of new and existing data sources No repeating data and unique in the sense that it is the only available inventory of this habitat
Usability Can be used to help inform a wide range of applications, including:
– Environmental policy decision making
– National habitat extent and connectivity assessments for targeting nature recovery
– Assessment of natural capital assets
– Ecosystem service modelling
-Updating the evidence base for key policy areas such as ELM
As an irreplaceable habitat, this dataset is of critical importance that should be used frequently, i.e. for:
– Meeting planning policy requirements (NPPF, BNG)
– Supporting sustainable management and restoration schemes
– Contributing to landscape-scale plans (NRNs)
Validity Accuracy varies between different habitats and regional zones – phase IV achieved an average habitat classification accuracy of 88%. The map has some known under mapped urban areas, with major roads, airports, car parks and dockland areas being classified under a number of other habitat types. This mainly affects habitat predictions around urban areas for the following broad habitat types: Broadleaved, Mixed and Yew Woodland; Coastal Sand Dunes; Bare Sand; Dwarf Shrub Heath; Acid, Calcareous and Neutral Grasslands. The Living England team are developing solutions for these and are looking to improve in the next iteration of the map due for release in 2024 There are known issues with the dataset with omissions and inaccuracies regularly reported. This can involve sites being included that shouldn’t be, as well as sites with boundary issues, or sites (often smaller ones) missing. Some counties have already been updated, while others still only have the original 1980s findings with ad hoc amendments

GiGL’s alternative

Living England Habitats Ancient Woodland Inventory
GiGL is currently working on improving our own habitat dataset that will be vital for use in projects involving Biodiversity Net Gain (BNG) and Local Nature Recovery Strategies (LNRS). Using on-the-ground knowledge and GIS, evidence-based conclusions can be made about habitats and their biological significance. GiGL is currently working on updating the AWI, assisted by national partners such as Natural England and Woodland Trust, to create a thorough and complete record of all Ancient Woodland in London. You can read more about this project here.  

Species Data

The tables below describes a number of datasets that contain open species data.

NBN Atlas

Data description All species
Download resolution Depends on dataset
Currently part of GiGL data? Not directly, but many datasets on the Atlas are incorporated earlier in the data chain
Potential Issues Licensing : Not all data open (e.g. some not for commercial purposes)
Precision: Not all data displayed at capture resolution
Reputation: Records not necessarily verified as they come from varying sources
Uniqueness : An end repository, so duplication likely if also accessing other data sources

iRecord

Data description All species
Download resolution Recorders only have full access to own records. A site species list can be downloaded, but not individual records. Full download resolution (unless confidential) as a LERC but licences applicable
Currently part of GiGL data? Should be fully integrated by early next year
Potential Issues: Usability: Individual records not available to general users

iNaturalist

Data description All species
Download resolution Depends on dataset
Currently part of GiGL data? Those which have reached Research Grade are shared with iRecord and accessed there
Potential Issues Reputation: Even records which have Research Grade are not necessarily correct and verified

RSPB Swift database

Data description Swifts only
Download resolution Full resolution (1m2)
Currently part of GiGL data? Yes
Potential Issues Usability: Limited to swifts, which raises potential issues when also using other data sources

Record Pool

Data description Amphibian & Reptiles
Download resolution Full download resolution as a LERC; 1km (non-sensitive) & 10km (sensitive) download resolution as a Record Pool User
Currently part of GiGL data? Yes
Potential Issues Licensing: Not all data available to general users

GBIF

Data description All species
Download resolution Depends on dataset
Currently part of GiGL data? Not directly, but various datasets are accessed directly from sources earlier in the data flow chain
Potential Issues Completeness: Not all available records are on GBIF
Uniqueness: An end repository, so duplication likely if also accessing other data sources
Licensing: Not all open (e.g. some not for commercial purposes)

Magic

Data description Various
Download resolution Depends on dataset
Currently part of GiGL data? Not directly, but various datasets are accessed directly from sources earlier in the data flow chain
Potential Issues Completeness: Limited datasets available
Uniqueness: Change of duplication high due to data flow
Precision: Data displayed at a lower resolution

Data.gov

Data description Various
Download resolution Depends on dataset
Currently part of GiGL data? Not directly, but various datasets are accessed directly from sources earlier in the data flow chain
Potential Issues Uniqueness: Datasets also available elsewhere
Completeness: Limited datasets available
Usability: Can only download individual datasets

BSBI

Data description Plant database
Download resolution 2km for a general user
Currently part of GiGL data? Not directly, but various datasets are accessed directly from sources earlier in the data flow chain
Potential Issues Licensing: Downloads unavailable for general users, though can access maps at 2km resolution