Deforestation in Borneo: a GIS case study

Dec 15, 2019

Smooth animated maps and fast ways to calculate geographical distances in R.

TL;DR

  • The tweenr library creates smooth animations by interpolating data points. However, its data format makes it tricky to use for (raster) maps;
  • the velox package dramatically improves rasterisation speeds for shape files as well as for a small number of related operations;
  • By using efficient nearest-neigbour calculations, it is possible to compute complex distance calculations orders of magnitude faster than using the usual raster::buffer or raster::distance;
  • Forest loss is catastrophic in some parts of the Sabah district of Borneo, and is driven by proximity to roads;
  • Protected areas have been so far largely shielded from deforestation in this region.

Introduction

We focus on the Sabah region of Malaysia on the island of Borneo. The area produces excellent tea (which is how I got to know it), but is also subject to intense deforestation, logging and mining. For this analysis, I will use the study of deforestation from the years 2000 to 2018 to introduce some nice methods for fast distance calculation on geographical rasters.

Getting to know Sabah

We extract Sabah and its administrative districts using the maptools package, and generate a nice plot:

library(maptools)
library(raster)
adm=getData('GADM', level=2, country="Malaysia")

plot(adm[adm$NAME_1=="Sabah",])
text(coordinates(adm), adm$NAME_2)

tongod=adm[adm$NAME_2=="Tongod",]
Administrative regions of Sabah province, Malaysia.
Administrative regions of Sabah province, Malaysia.

For the remainder of this post, we will focus on the central Tongod district.

The administrative region data is a polygon dataset. To study forest cover and loss, we will use the renowned Hansen dataset, which is freely available online in raster form. While shape files can be thought of as vector images describing abstract geometric shapes, rasters are pixellated images of a map, where each pixel’s color represents a value of interest: elevation, presence of a species, or indeed, percentage of tree cover. We need to transform the polygons to align them with the projection used for the forest cover datasets. We get the 2000 tree cover data from this address:

gc()
library(rgdal)
cover00=raster("K:/Downloads/Hansen_GFC-2017-v1.5_treecover2000_10N_110E.tif")
proj=projection(cover00)
tongod=spTransform(tongod, CRS(proj))

You can also use the convenient gfcanalysis package to automatically download the Hansen files you need:

library(gfcanalysis)
tiles=calc_gfc_tiles(tongod)
data_folder='.'
download_tiles(tiles, data_folder)

and so on as per the package’s help page. We don’t use it because it slows things down by using files at every step. Since we are not in th 90’s anymore, we opt for an approach that does most operations on the fly in memory. These datasets are huge and not practical to work with at this scale. We first crop, then subset the tree cover datasets to the Tongod region. We use the fantastic package velox that speeds things up by an order of magnitude:

gc()
library(velox)
cover00.tongod.ext=crop(cover00,extent(tongod))
blank=cover00.tongod.ext
values(blank)=0
vx=velox(blank)
vx$rasterize(tongod, "ID_1")
blank=vx$as.RasterLayer()
values(blank)[values(blank)==0]=NA
values(blank)[!is.na(values(blank))]=values(cover00.tongod.ext)[!is.na(values(blank))]

# this thresholds the raster at 80% coverage
values(blank)[values(blank)<0.8]=0
values(blank)[values(blank)>0.8]=1

Visualising loss per year

Now we load the forest cover loss per year:

gc()
loss=raster("K:/Downloads/Hansen_GFC-2017-v1.5_lossyear_10N_110E.tif")
loss.tongod.ext=crop(loss, extent(tongod))

We use the animation library and its helpful saveGIF function. We also downsample the raster to make calculations and plotting faster:

library(animation)
library(RColorBrewer);
library(data.table)

smaller=aggregate(blank, 5, FUN=min)

gc()
loss.tongod.smaller=aggregate(loss.tongod.ext, 5, FUN=min)
saveGIF({
    plot(smaller, box=F,legend=F, axes=F,col=brewer.pal(10, "Greens"));
    mtext("2000", cex=2, line=2);
    for(i in 2:18){
        test=smaller;
        values(test)[values(loss.tongod.smaller)<i & values(loss.tongod.smaller)>0]=0;
        plot(test, box=F,legend=F, axes=F,col=brewer.pal(10, "Greens"));
        mtext(2000+i-1, cex=2, line=2)
    }
},ani.width=800, ani.height=800, interval=0.2)

This gives the following:

Forest loss in the Tongod district of Sabah region, choppy animation.
Forest loss in the Tongod district of Sabah region, choppy animation.

Making it smooth with tweenr

Currently we have a choppy image. But we can interpolate if we want smoother transitions. For this we use the tweenr library. Unfortunately, it can’t work with matrices, so we have to do some data wrangling to convert it to a data table. Beware, this takes time and memory:


## create an initial data table for year 2000 containing tree cover
df=as.data.table(coordinates(smaller))
df$id=paste(df$x,df$y)
df$value=values(smaller)
df$x=NULL;df$y=NULL;df$year=00
df=df[!is.na(df$value),]

## for each subsequent year, we append that year's loss data to the data table
for(i in 16:0){
    gc()
    print(i);
    flush.console();
    coveri=smaller;
    values(coveri)[!is.na(values(coveri)) & values(loss.tongod.smaller)>0 & values(loss.tongod.smaller)<(18-i)]=0;
    plot(coveri)
    dfapd=as.data.table(coordinates(coveri));
    dfapd[, id:=do.call(paste,.SD), .SDcols=c("x","y")];
    dfapd$value=values(coveri);
    dfapd$x=NULL;dfapd$y=NULL;
    dfapd=dfapd[!is.na(dfapd$value),]
    dfapd$year=17-i;
    df=rbind(df,dfapd)
}

## this is the "ease" parameter that defines how values are interpolated.
df$ease="linear"


## We interpolate between 2000 and 2001
interp=tween_elements(df[df$year %in% c(0,1),], "year", group="id", ease="ease", nframes=5)
interp=as.data.table(interp)
interp[,c("x", "y") := tstrsplit(.group, " ", fixed=T)]
interp$x=as.numeric(interp$x)
interp$y=as.numeric(interp$y)
interpadd=interp

## We add it for every subsequent year
## Trying to do all years at the same time makes R crash
for(i in 1:17){
    print(i)
    flush.console()
    interp=tween_elements(df[df$year %in% c(i,i+1),], "year", group="id", ease="ease", nframes=5)
    interp$.frame=interp$.frame+i*5
    gc()
    interp=as.data.table(interp)
    interp[,c("x", "y") := tstrsplit(.group, " ", fixed=T)]
    interp$x=as.numeric(interp$x)
    interp$y=as.numeric(interp$y)
    interpadd=rbind(interpadd, interp)  
}


titles=c(rep(2000,3), rep(2001:2016, each=5), rep(2017,3))
saveGIF({
    for(i in 0:85){
    print(i)
    flush.console()
    test=smaller
    gc()
    test[cellFromXY(test, interpaddu[interpaddu$.frame==2,c("x","y")])]=interpaddu$value[interpaddu$.frame==i]
    plot(test, box=F, legend=F, axes=F, col=brewer.pal(9, "Greens"))
    mtext(titles[i+1], cex=2, line=2)
}
}, ani.width=800, ani.height=800, interval=0.01)

Was it worth writing all that code just to have a smooth animation that shows exactly the same as before? You be the judge.

Forest loss in the Tongod district of the Sabah region, fluid animation.
Forest loss in the Tongod district of the Sabah region, fluid animation.

Total loss and protected areas

Let’s load protected areas, downloadable from the WDPA website, and crop it to the region:

wdpa=readOGR(dsn=".", "WDPA_Dec2019-shapefile-polygons")
wdpa=crop(wdpa, extent(tongod))

We replace the value 0 in the loss dataset with NA to make its background transparent. This way we can easily superimpose layers on one another.

loss.transp=loss.tongod.ext
values(loss.transp)[values(loss.transp)==0]=NA
plot(cover00.tongod.ext)
plot(loss.transp, add=T, col="red")
plot(tongod, add=T)
plot(wdpa, add=T)
Protected areas (blue) have little or no deforestation (red). The Area of Interest (AOI) for the rest of the analysis is highlighted in yellow
Protected areas (blue) have little or no deforestation (red). The Area of Interest (AOI) for the rest of the analysis is highlighted in yellow

Area of interest for fragmentation

When studying the impact of deforestation, we are often interested in how less “livable” the partially deforested areas have become for a species of interest. You may have seen dramatic images of orang utangs fighting against excavators. But logging does not happen uniformly at the forest’s border: some areas are easier to access than others, which can break up a previously continuous habitat and isolate patches of forest. This means that logging activities can cause the loss of an area much larger than the area actually deprived of trees. Fragmentation is one way to quantify this type of habitat loss.

From the plot above, we select an area where deforestation has had a big impact, where we will be able to study habitat fragmentation.

aoi.extent=extent(c(117.2, 117.5, 4.65, 4.87))
aoi.cover00=crop(cover00.tongod.ext, aoi.extent)
aoi.loss=crop(loss.transp, aoi.extent)

restrict.to.shape=function(raster, shape){
    library(velox)
    retraster=raster
    values(retraster)=NA
    vx=velox(retraster)
    vx$rasterize(shape, "ID_1")
    retraster=vx$as.RasterLayer()
    values(retraster)[!is.na(values(retraster))]=values(raster)[!is.na(values(retraster))]
    return(retraster)
}
aoi.cover00=restrict.to.shape(aoi.cover00, tongod)
aoi.loss=restrict.to.shape(aoi.loss, tongod)
plot(aoi.cover00)
plot(aoi.loss, add=T, col="red")

At this point let’s simplify a bit. Our habitat is going to be all areas covered at more than 80% by forest in 2000.

values(aoi.cover00)[values(aoi.cover00)>0.8]=1
values(aoi.cover00)[values(aoi.cover00)<0.8]=0

We incorporate 17-year loss to produce tree cover in 2018:

aoi.cover18=aoi.cover00
values(aoi.cover18)[!is.na(values(aoi.loss))]=0

Projections

SDMTools understands lat/lon coordinates, which the Hansen tree cover dataset uses, so we don’t need to transform our data. If we wanted to think in terms of square kilometers, we would need to project our dataset. Read about this here. To project, we need to know which zone we are in. According to this website, Sabah is in UTM Zone 50. We project:

newproj=CRS("+proj=utm +zone=50 +datum=WGS84")
aoi.cover00=projectRaster(aoi.cover00, crs=newproj)
aoi.cover18=projectRaster(aoi.cover18, crs=newproj)
res(aoi.cover00)

This tells us that we have a resolution of approximately 27.65m per pixel edge (30m at the equator). Because the projection does an average of the cell values, it’s better to do this early on, otherwise you will have to re-threshold the data.

Patch statistics

Then we calculate patch statistics:

library(SDMTools)
patchStats00=ClassStat(aoi.cover00, latlon=T)
patchStats18=ClassStat(aoi.cover18, latlon=T)

We can concentrate on a few key indicators. During these 17 years, 8 patches of forest with an average patch area of 66 km2 for a total of 528 km2 became 2,825 patches with an average area of 0.1 km2 and a total area of 318 km2. This represents a dramatic example of fragmentation and area loss.

Distance between patches

While it’s easy to get the distribution of areas for our 2,825 patches, computing shortest distance between patches is much harder, because the distance is minimised across all points of the edges of our patches.

Let’s see how we do it for areas: first we label the patches, then either run ClassStats as before, or even simpler, count the cells for each possible value with table:

values(aoi.cover18)[is.na(values(aoi.cover18))]=0
labelled.patches=ConnCompLabel(aoi.cover18) 
labelled.stats=ClassStat(labelled.patches)
areas=labelled.stats$total.area
## or, simply
areas=table(values(labelled.patches))
hist(areas)

For distance between patches, we proceed a different way. We take advantage of the fact that if two patches are closer than \(\LARGE d\), then they will be merged if we extend a buffer of size \(\LARGE\frac{d}{2}\) around them. By studying the decrease in patch number with increasing buffer sizes, we can quantify the distribution of distances between patches.

increment=seq(1, 500, by=15) ## our cell dimension is about 30x30
npatch=NULL
values(aoi.cover18)[values(aoi.cover18)==0]=NA
for(margin in increment){
    buffered=buffer(aoi.cover18, increment)
    plot(buffered)
    stats=ClassStats(labelled.patches, latlon=T)
    npatch=c(npatch, stats$n.patches)
}

This is nice, but the buffer function is extremely slow. We can use a trick by using a faster algorithm. We convert the raster to two data frames: one contains the coordinates of forested points, the second of deforested points.

buffer.around.raster=function(raster, margin){
    library(RANN)
    df=coordinates(raster)
    df=as.data.frame(df)
    df$value=values(raster)
    one=df[!is.na(df$value),]
    two=df[is.na(df$value),]
    one$value=NULL
    two$value=NULL

    if(margin==0){
        return(raster)
    }
    test.d=nn2(one, two, k=1)
    test.buf=raster
    values(test.buf)=NA
    test.buf[cellFromXY(test.buf, one[, c("x", "y")])]=0
    test.buf[cellFromXY(test.buf, two[, c("x", "y")])]=test.d$nn.dist
    test.buf[values(test.buf)>margin]=NA
    test.buf[!is.na(values(test.buf))]=1
    return(test.buf)
}

We just change our previous bit of code to call the new function:

increment=seq(0, 500, by=15) ## our cell dimension is about 30x30
npatch=NULL
values(aoi.cover18)[values(aoi.cover18)==0]=NA

aoi.cover18.p=projectRaster(aoi.cover18, crs=newproj)
values(aoi.cover18.p)[!is.na(values(aoi.cover18.p))]=1

for(margin in increment){
    buffered=buffer.around.raster(aoi.cover18.p, margin)
    plot(buffered, main=paste("buffer =", margin))
    stats=ClassStat(buffered, latlon=F)
    npatch=c(npatch, stats$n.patches)
}

plot(npatch, type="b", xaxt="n", xlab="distance")
axis(seq(0, (length(npatch)-1)*15, by=15),side=1, at=1:length(npatch), las=2)
Number of patches per buffer increment.
Number of patches per buffer increment.

We can actually stop the calculation once we see that we’ve reached 1 giant patch. The graph shows us that many patches are fairly close to each other (<60m), while most are separated by less than 20m. Note that the max number of patches, now 2222, has been reduced by projection effects. This calculation would have taken about a day with the raster::buffer function, it takes about a minute using RANN:nn2.

Distance to feature

We can also very easily calculate the shortest distance of any point of a given class to any point of another class with the same trick. Let’s load the roads, which we get from GeoFabrik:

roads=spTransform(readOGR(dsn=".", "gis_osm_roads_free_1"), CRS=newproj)
roadmin=crop(roads, extent(aoi.cover18.p))
rm(roads) ## this takes up a lot of RAM
gc()

deforested areas, roads added.
deforested areas, roads added.

We rasterise them:

blank=aoi.cover18.p
values(blank)=NA
vx=velox(blank)
vx$rasterize(roadmin, "osm_id")
roads=vx$as.RasterLayer()
values(roads)[!is.na(values(roads))]=1

Now, we have two rasters, one containing the forested/deforested areas, the other one containing the roads. Let’s calculate the closest distance to a road across the entire region using the previously described trick:

df=coordinates(roads)
df=as.data.table(df)
df$value=values(roads)
road=df[!is.na(df$value),]
road$value=NULL

nonroad=df[is.na(df$value),]
nonroad$value=NULL

dist=nn2(road, nonroad, k=1)
dist.raster=roads
values(dist.raster)=NA
dist.raster[cellFromXY(dist.raster, road[, c("x", "y")])]=0
dist.raster[cellFromXY(dist.raster, nonroad[, c("x", "y")])]=dist$nn.dist

closest distance to a road.
closest distance to a road.

Deforestation as a function of distance

Do we see more deforestation in areas close to roads? In other words, are deforested areas closer to roads than non-deforested ones? We not have all the tools to answer that question:

# We regenerate aoi.cover18 to distinguish between out of AOI and deforested
aoi.cover18=aoi.cover00
values(aoi.cover18)[!is.na(values(aoi.loss))]=0
aoi.cover18.p=projectRaster(aoi.cover18, crs=newproj)
values(aoi.cover18.p)[!is.na(values(aoi.cover18.p)) & values(aoi.cover18.p)<0.5]=0
values(aoi.cover18.p)[!is.na(values(aoi.cover18.p)) & values(aoi.cover18.p)>0.5]=1

## get distance to road for each type of forestation
forested=values(dist.raster)[!is.na(values(aoi.cover18.p)) & values(aoi.cover18.p)==1]
deforested=values(dist.raster)[!is.na(values(aoi.cover18.p)) & values(aoi.cover18.p)==0]
boxplot(forested, deforested, names=c("forested", "deforested"), ylab="distance")
distance to road vs deforestation.
distance to road vs deforestation.

From this plot, we see a clear effect of road proximity on deforestation. This can be confirmed by a hypothesis test, such as the T-test (or another distribution comparison test, since the distance distributions are not guaranteed to be normal).