Install

This uses R and RStudio installed directly on your system if you have both of those installed skip to step 3.

  1. Install R from CRAN: https://cran.r-project.org/

  2. Install RStudio IDE: https://posit.co/download/rstudio-desktop/

  3. Install required R packages.

    installed <- rownames(installed.packages())
    if(!"remotes" %in% installed)
      install.packages("remotes")
    if(!"rnaturalearthdata" %in% installed)
      install.packages("rnaturalearthdata")
    remotes::install_github("birdflow-science/BirdFlowModels")  
    remotes::install_github("birdflow-science/BirdFlowR", build_vignettes = TRUE)

    Package dependencies can be a pain. If the above doesn’t work you can also try the alternative method below, also executed in the RStudio console.

    installed <- rownames(installed.packages())
    if(!"pak" %in% installed)
      install.packages("pak")
    
    pak::pkg_install("rnaturalearthdata", ask = FALSE)
    pak::pkg_install("birdflow-science/BirdFlowModels", ask = FALSE, )
    pak::pkg_install("birdflow-science/BirdFlowR", ask = FALSE, 
                     dependencies = TRUE)

Load R packages and setup ebirdst

library(BirdFlowR)
library(ebirdst)
library(terra)

# ebirdst::set_ebirdst_access_key()   ### Edit as needed

Preprocess

Set the output directory and choose a species.
Using “./” will output to the BirdFlowDemo directory.

dir <- "./"                    ### Edit as needed

species <- "Eastern Kingbird"   ### Edit as desired

if(is.na(ebirdst::get_species(species)))
  stop(species, "is not recognized")

Do the preprocessing. This will write an hdf5 file to the output dir (./).

bf <- preprocess_species(species, dir, gb = 1)
## Species resolved to: 'easkin' (Eastern Kingbird)
## Data already exists, use force = TRUE to re-download.
## Setting max_params to  23224801  anticipating  1  GB of memory.
## Calculating resolution
##   Attempt  1  at setting resolution
##   (152.318km chosen)
##    246.6008 % of target (estimate).
##   trying again
##   Attempt  2  at setting resolution
##   (190.875km chosen)
##    110.0405 % of target (estimate).
##   trying again
##   Attempt  3  at setting resolution
##   (195.495km chosen)
##    104.1728 % of target (estimate).
##   trying again
##   Attempt  4  at setting resolution
##   (197.504km chosen)
##    98.15342 % of target (estimate).
##  success
## Rounded to 198 final resolution.
## Reading low resolution (27 km) geoTIFFs
## Creating mask in target resolution and projection
## Reprojecting and cropping to mask:
##  abundance done.
##  Upper CI done.
##  Lower CI done.
## Resampling to target resolution (198 km)
## Model has:
##  575 active cells,
##  52 transitions, and
##  17,193,075 parameters 74% of maximum parameters)
## Writing hdf5:  ./easkin_2021_198km.hdf5

Plot a few of the distributions to make sure things look ok.

plot(rast(bf, c(1, 12, 24, 28)))

Fit the models and then come back

https://colab.research.google.com/drive/1WbfvUqax5IIyHnrt3y7SqOfavQKs-xB3

Import

model_file <- paste0(dir,  species(bf, "code"), "_trained.hdf5")   # automated guess

# model_file <- "./amewoo_trained"  ### Edit as needed (to manually specify file)

stopifnot(file.exists(model_file))

bf <- import_birdflow(model_file)
bf
## Eastern Kingbird BirdFlow model
##   dimensions   : 50, 35, 52  (nrow, ncol, ntimesteps)
##   resolution   : 198000, 198000  (x, y)
##   active cells : 575
##   size         : 131.4 Mb

Sparsify

Sparsification reduces memory usage, file size, and processing time for the model.

The state method eliminates all transitions into and out of model states (locations in space and time) for which the ebirdst distribution is zero.
Those states are thus removed from the model. It reduces file size the least of the methods we’ve tried but also preserves the model structure the most.

Using “state+conditional” will reduce file size substantially more but will also eliminate additional model states (after dead ends are cleaned up).

More on sparsification methods? in the help with ?sparasify().

 bf <- sparsify(bf, method = "state")
## Evaluating full model performance
## Starting state based sparsification.
## Evaluating post-sparsification performance
##  85.5% zero
##  0.187% density lost
##  traverse correlation:
##      full: 0.982
##      sparse: 0.982
## Warning in bf$metadata$sparse$method <- method: Coercing LHS to a list
 # bf <- sparsify(bf, method = "state+conditional", p = 0.995) # This will reduce the file size a lot more
print(bf$metadata$sparse$stats)
##    model pct_zero pct_density_lost mean_step_cor min_step_cor traverse_cor
## 1   full    0.000        0.0000000     0.9995108    0.9982212    0.9824825
## 2 sparse   85.487        0.1867287     0.9995116    0.9982210    0.9823185
##   mean_distr_cor min_distr_cor
## 1      0.9995813     0.9979229
## 2      0.9995797     0.9979223

Definitions of performance metrics are available with ?evaluate_performance()

Save sparse model

Saving the sparse model will save time and disk space if we want to use it later. Here we write it as a serialized R object which is fast and efficient but not at all portable to other software.

sparse_file <- file.path(dir, "sparse.Rds")  
saveRDS(bf, file = sparse_file)

In a later session we could then read it with.

bf <- readRDS(sparse_file)

Use the model

Species information and metadata

species_info() and get_metadata() take a BirdFlow object as their first argument. An optional second argument allows specifying a specific item, if omitted a list is returned with all the available information.

species(bf) is a shortcut for species_info(bf, "common_name")

Use ?species_info() to see descriptions of all the available information. Dates associated with migration and resident seasons are likely to be useful.

species(bf)  
## [1] "Eastern Kingbird"
species(bf, "scientific")
## [1] "Tyrannus tyrannus"
species_info(bf, "prebreeding_migration_start")
## [1] "2021-03-08"
si <-  species_info(bf) # list with all species information
md <- get_metadata(bf)  # list with all metadata
get_metadata(bf, "birdflow_model_date") # date model was exported from python
## [1] "2023-03-09 03:24:42.112366"
validate_BirdFlow(bf)  # throws error if there are problems

Spatial aspects

BirdFlow models are based on a raster representation of a time series of species distributions and contain all the spatial information necessary to recreate those distributions and to define how the raster is positioned in space. BirdFlowR uses the terra package to import raster data and provides
BirdFlow methods for functions defined in the terra package - so that you can use those functions on BirdFlow objects.

crs() returns the coordinate reference system - useful if you need to project other data to match the BirdFlow object. res(), xres(), and yres() describe the dimensions of individual cells in the model. ext() returns a terra extent object.

# Methods for terra functions:
a <- crs(bf) # well known text (long)
crs(bf, proj = TRUE)  # proj4 string
## [1] "+proj=laea +lat_0=19.497 +lon_0=-89.678 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(bf)
## [1] 198000 198000
c( xres(bf), yres(bf) ) # same as res(bf)
## [1] 198000 198000
ext(bf)
## SpatExtent : -3054857.14285714, 3875142.85714286, -4752000, 5148000 (xmin, xmax, ymin, ymax)
c(xmin(bf), xmax(bf), ymin(bf), ymax(bf)) # same as ext(bf)
## [1] -3054857  3875143 -4752000  5148000

Retrieve and plot distributions

A distribution in BirdFlow is stored as a vector of values that correspond to only the active cells (n_active()) in the model. Multiple distributions are stored as matrices with n_active() rows (a column for each distribution).

We can retreive distributions in this format with get_distr(). Use timestep, character dates, date objects, or “all” to specify which distributions to retrieve.

Retrieve the first distribution and compare its length to the number of active cells.

d <- get_distr(bf, 1) # get first timestep distribution 
length(d)  # 1 distribution so d is a vector
## [1] 575
n_active(bf)  # its length is the the number of active cells in the model
## [1] 575

Get 5 distributions, the result is a matrix in which each column is a distribution with a row for each active cell.

d <- get_distr(bf, 26:30) 
dim(d)
## [1] 575   5
head(d, 3)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0

We can also specify distributions with dates, or use “all” to retrieve all the distributions.

d <- get_distr(bf, "2022-12-15") # from character date
d <- get_distr(bf, "all")  # all distributions
d <- get_distr(bf, Sys.Date())  # Using a Date object 

Use rasterize_distr() to convert a distribution to a SpatRaster defined in the terra package. The second argument, the BirdFlow model, is needed for the spatial information it contains.

d <- get_distr(bf, c(1, 26)) # winter and summer
r <- rasterize_distr(d, bf) # convert to SpatRaster 

Alternatively convert directly from BirdFlow to SpatRast with rast() the second, optional, argument which accepts the same inputs a as which in get_distr().

r <- rast(bf) # all distributions
r <- rast(bf, c(1, 26))  # 1st, and 26th timesteps.
plot(r)

BirdFlowR provides some convenience wrappers to functions in rnaturalearth that load vector data and then crop and reproject it to make it suitable for plotting with BirdFlow output.

r <- rast(bf, species_info(bf, "prebreeding_migration_start"))
plot(r)
coast <- get_coastline(bf)  # lines
plot(coast, add = TRUE)

Forecasting

In this section we will sample a single starting location from the winter distribution and project it forward to generate a distribution of predicted breeding grounds for birds that wintered at the starting location.

Set forecast parameters.

    start <- 1     #  winter 
    end <-  26     # summer

Sample starting distribution

sample_distr() will sample from one or more input distribution to select a single location per distribution. The result is one or moe distributions with ones in the selected location(s) and zero elsewhere.

set.seed(0)
d <- get_distr(bf, start)
location <- sample_distr(d)

print( i_to_xy( which( as.logical(location) ), bf) )  # starting coordinates
##            x        y
## [1,] 3182143 -3663000

Project forward from this location to summer

Forecast returns the distribution over time as a matrix with one column per timestep.

The plot shows where birds that winter at a particular location are likely to be as the year progresses and ultimately where they might spend their summer. The probability density spreads as the weeks progress.

f <- forecast(bf, location, start, end, "forward")

r <- rasterize_distr(f[, c(1, 7, 14, 19)], bf)
plot(r)

Alternatively we can calculate the difference between the projected distribution for bird’s wintering at that particular location and the distribution of the species as a whole at the same timestep.

projected <- f[ , ncol(f)]  # last projected distribution
diff <-  projected - get_distr(bf, end) 
plot(rasterize_distr(diff, bf))
plot(coast, add = TRUE)

Generate synthetic routes

In this section we sample locations from the American Woodcock winter distribution and then generate routes to their summer grounds.

Set route parameters.

n_positions <-  15 # number of starting positions
n_each <- 1        # how many birds to start at each
start <- 1         # starting timestep (winter)
end <- 26          # ending timestep (summer)

Generate starting locations

First extract the winter distribution, then use sample_locations() with n = n_positions to sample the input distribution repeatedly resulting in a matrix in which each column has a single 1 representing the sampled location.

d <- get_distr(bf, start)
locations  <- sample_distr(d, n = n_positions)  

Collapse the locations down to a vector of the index of each non-zero value and then convert to x and y coordinates.

ind <- apply(locations, 2, function(x) which( as.logical(x) ) )
x <- i_to_x(ind, bf)
y <- i_to_y(ind, bf)

Plot the starting (winter) distribution and sampled locations

winter <- rasterize_distr(d, bf)
plot(winter)
points(x, y)
plot(coast, add = TRUE)

Generate routes

route() will generate synthetic routes for each starting position. Currently route() returns a list with two items (this may change):

  • points a data.frame with a row for each timestep of each route
  • lines sf object containing a line for each route
rts <- route(bf, x_coord = x, y_coord = y, start = start,
             end = end, n = n_each)
head(rts$points, 4)
##         x        y route timestep       date
## 1 2984143 -4059000     1        1 2021-01-04
## 2 2984143 -4059000     1        2 2021-01-11
## 3 2984143 -4059000     1        3 2021-01-18
## 4 1994143 -3069000     1        4 2021-01-25

Plot routes

Plot the route lines over the summer distribution along with points at the starting and ending positions.

d <- get_distr(bf, end)
summer <- rasterize_distr(d, bf)

line_col <- rgb(0, 0, 0, .2)
pt_col <- rgb(0, 0, 0, .5 )

plot(summer) 
points( x, y, cex = .4, col = pt_col, pch = 16) # starting points
plot(rts$lines, add = TRUE, col = line_col)  # routes
end_pts <- rts$points[rts$points$timestep == end, ]  # end points
points(x = end_pts$x, y = end_pts$y,
       cex = 0.4, pch = 12, col = pt_col )
plot(coast, add = TRUE)
title(main = species(bf))