This tutorial gives a step-by-step guide on how to calculate the Biodiversity Intactness Index (BII) using the PREDICTS database. We’ll go through where to find the PREDICTS data, how we go about analysing this data and finally how to project and calculate BII, using R. We’ll be using just a subset of the data and a very simple model to start with.


PREDICTS – Projecting Responses of Ecological Diversity In Changing Terrestrial Systems – is a collaboration that aims to model how local terrestrial biodiversity worldwide responds to land use and related pressures, and to use such models to project how biodiversity may change under future socioeconomic scenarios. Expand to learn more The PREDICTS team have collated a large database of biodiversity data from over 29,000 sites worldwide. These data have been provided by authors of over 700 separate surveys, each of which sampled biodiversity at multiple sites facing different land-use and related pressures. Although the different surveys have used a very wide range of different techniques, and focused on a wide array of different plant, fungal, invertebrate or vertebrate taxa (the database now holds data from over 50,000 species), the data from different sites within a survey are comparable, having been collected in the same way. Each site’s land use and land-use intensity has been classified into a consistent set of categories, based on information in the paper or from the authors, and some other human pressures (e.g., human population density) are estimated for each site from global rasters. Details of how the database was put together – including definitions of our land-use and use-intensity classes – can be found in this paper; the first released version of the database is available from the Natural History Museum’s data portal, and is described in this paper.

Because the database holds the original biodiversity data, statistical analysis can model a wide range of response variables (as explained in this paper), including site-level measures of diversity (such as within-sample species richness, rarefaction-based richness or overall organismal abundance), among-site differences in community composition, or the occurrence and/or population size of each species. Combining our data with other species-level information permits modelling of functional and phylogenetic diversity or of a community-weighted measure of geographic range size. This walkthrough document focuses on modelling two response variables: total site-level abundance, and compositional similarity between sites.

Because the surveys in the database arise from such very different methodologies, we fit mixed-effects models with survey-level random effects, allowing us to focus on how human pressures affect the response variable while acknowledging the among-survey heterogeneity. We assume that biotic differences among matched sites with different land uses are caused by the land-use difference, a form of space-for-time substitution. Some of our models also assume that the biota at sites with minimally-used primary vegetation (and minimal levels of other pressures) approximates their pristine biota. These assumptions would not be needed if representative long-term temporal data were available.

Models can be fitted to the whole global data or to regions of particular interest (such as tropical and subtropical forest biomes). The models can be combined with detailed spatiotemporal data on the pressures to map the projected current state of the response variable, estimate how it has changed in the past, or project its future under alternative scenarios.

About BII

The Biodiversity Intactness Index (BII) was initially proposed in 2005 as a sound, sensitive, easily understood and affordable biodiversity indicator that could easily be applied at any spatial scale and would allow for comparison with a policy target and a baseline. BII is defined as the average abundance, across a large and functionally diverse set of species, relative to their reference populations (which would ideally be populations before any impacts of modern industrial society, but which practically have to be populations in the least impacted settings available); non-native species are excluded from the calculation. Expand to learn more

BII became more prominent with its adoption in the 2015 revision of the Planetary Boundaries framework as an interim measure of biosphere integrity. The framework proposed that reduction of average BII to below 90% across a large region such as a biome would risk large-scale disruption of the flow of ecosystem services and jeopardise sustainable development, though the paper acknowledged that the precise placement of the ‘safe limit’ for BII was very uncertain.

In the absence of sufficient collated biodiversity data, BII was initially estimated using carefully-structured expert opinion. The PREDICTS team first estimated BII based on primary biodiversity data in 2016, by combining two statistical models – one of site-level organismal abundance, and one of compositional similarity to a site still having primary vegetation. The latter model was needed to account for the fact that models of overall organismal abundance do not consider turnover in species composition. Although we gave several reasons why our estimates of BII were likely to be overoptimistic, our 2016 estimates nonetheless placed the world, nearly all biomes and nearly all biodiversity hotspots below the proposed ‘safe limit’ for BII.

We have continued to refine the modelling framework to improve our estimates of BII. The most important improvements since the 2016 paper have been:

  • Use of a more stringent baseline in models of compositional similarity. Whereas the 2016 paper used all primary vegetation sites (even those with intense human use) as the baseline land use, growth of the database and a switch to a more efficient (matrix-based) model framework have allowed us to use only minimally-used primary vegetation as the baseline.
  • A more principled transformation of the compositional similarity estimates prior to modelling. Although the log-transformation used in the 2016 paper produced acceptable model diagnostics, it does not recognise the bounded nature of compositional similarity (which can range from 0 to 1). We now use a logit transformation instead, which provides more sensitive discrimination among land uses.

The estimates of BII that result from the improved framework tend to be markedly lower than those we obtained in 2016. One issue that remains to be addressed is that the land-use rasters we have been using to make spatial projections do not differentiate planted from natural forest, meaning our estimates are still likely to be too high in regions with extensive planted forest. Work to address this shortcoming is underway.

BII, being an indicator of the average state of local ecological communities, complements indicators based on species’ global conservation status (such as the Red List Index), or on population trends (such as the Living Planet Index). These different facets of biodiversity are all important, and can be combined to provide a roadmap towards restoring global biodiversity.

Load some packages

library(dplyr) # for easy data manipulation
library(tidyr) # ditto
library(magrittr) # for piping
library(lme4) # for mixed effects models
library(car) # for logit transformation with adjustment
library(raster) # for working with raster data
library(geosphere) # calculating geographic distance between sites
library(foreach) # running loops
library(doParallel) # running loops in parallel

Prepare the biodiversity data

You can download the PREDICTS data from the Natural History Museum data portal. If you’re working in R, I would strongly suggest downloading database.rds (the database is very large, and the rds file is much quicker to load in than the csv file).

Once you’ve downloaded the database, read it in. I’m going to filter the data for just the Americas, to make the data manipulation and modelling a bit quicker. You can calculate BII for any region of the world for which there are data, although we tend to do our BII modelling on a global scale or at least across multiple biomes.

# read in the data
diversity <- readRDS("database.rds") %>%
  # now let's filter out just the data for the Americas
  filter(UN_region == "Americas")

## Observations: 825,682
## Variables: 67
## $ Source_ID                               <fct> AD1_2005__Shuler, AD1_...
## $ Reference                               <fct> Shuler et al. 2005, Sh...
## $ Study_number                            <int> 1, 1, 1, 1, 1, 1, 1, 1...
## $ Study_name                              <fct> Shuler2005_flowervisit...
## $ SS                                      <fct> AD1_2005__Shuler 1, AD...
## $ Diversity_metric                        <fct> effort-corrected abund...
## $ Diversity_metric_unit                   <fct> effort-corrected indiv...
## $ Diversity_metric_type                   <fct> Abundance, Abundance, ...
## $ Diversity_metric_is_effort_sensitive    <lgl> FALSE, FALSE, FALSE, F...
## $ Diversity_metric_is_suitable_for_Chao   <lgl> FALSE, FALSE, FALSE, F...
## $ Sampling_method                         <fct> systematic searching, ...
## $ Sampling_effort_unit                    <fct> day, day, day, day, da...
## $ Study_common_taxon                      <fct> Hymenoptera, Hymenopte...
## $ Rank_of_study_common_taxon              <fct> Order, Orde