This uses R and RStudio installed directly on your system if you have both of those installed skip to step 3.
Install R from CRAN: https://cran.r-project.org/
Install RStudio IDE: https://posit.co/download/rstudio-desktop/
Install required R packages.
installed <- rownames(installed.packages())
if(!"remotes" %in% installed)
install.packages("remotes")
if(!"rnaturalearthdata" %in% installed)
install.packages("rnaturalearthdata")
remotes::install_github("birdflow-science/BirdFlowModels")
remotes::install_github("birdflow-science/BirdFlowR", build_vignettes = TRUE)
Package dependencies can be a pain. If the above doesn’t work you can also try the alternative method below, also executed in the RStudio console.
installed <- rownames(installed.packages())
if(!"pak" %in% installed)
install.packages("pak")
pak::pkg_install("rnaturalearthdata", ask = FALSE)
pak::pkg_install("birdflow-science/BirdFlowModels", ask = FALSE, )
pak::pkg_install("birdflow-science/BirdFlowR", ask = FALSE,
dependencies = TRUE)
library(BirdFlowR)
library(ebirdst)
library(terra)
# ebirdst::set_ebirdst_access_key() ### Edit as needed
Set the output directory and choose a species.
Using “./” will output to the BirdFlowDemo directory.
dir <- "./" ### Edit as needed
species <- "Eastern Kingbird" ### Edit as desired
if(is.na(ebirdst::get_species(species)))
stop(species, "is not recognized")
Do the preprocessing. This will write an hdf5 file to the output dir (./).
bf <- preprocess_species(species, dir, gb = 1)
## Species resolved to: 'easkin' (Eastern Kingbird)
## Data already exists, use force = TRUE to re-download.
## Setting max_params to 23224801 anticipating 1 GB of memory.
## Calculating resolution
## Attempt 1 at setting resolution
## (152.318km chosen)
## 246.6008 % of target (estimate).
## trying again
## Attempt 2 at setting resolution
## (190.875km chosen)
## 110.0405 % of target (estimate).
## trying again
## Attempt 3 at setting resolution
## (195.495km chosen)
## 104.1728 % of target (estimate).
## trying again
## Attempt 4 at setting resolution
## (197.504km chosen)
## 98.15342 % of target (estimate).
## success
## Rounded to 198 final resolution.
## Reading low resolution (27 km) geoTIFFs
## Creating mask in target resolution and projection
## Reprojecting and cropping to mask:
## abundance done.
## Upper CI done.
## Lower CI done.
## Resampling to target resolution (198 km)
## Model has:
## 575 active cells,
## 52 transitions, and
## 17,193,075 parameters 74% of maximum parameters)
## Writing hdf5: ./easkin_2021_198km.hdf5
Plot a few of the distributions to make sure things look ok.
plot(rast(bf, c(1, 12, 24, 28)))
https://colab.research.google.com/drive/1WbfvUqax5IIyHnrt3y7SqOfavQKs-xB3
model_file <- paste0(dir, species(bf, "code"), "_trained.hdf5") # automated guess
# model_file <- "./amewoo_trained" ### Edit as needed (to manually specify file)
stopifnot(file.exists(model_file))
bf <- import_birdflow(model_file)
bf
## Eastern Kingbird BirdFlow model
## dimensions : 50, 35, 52 (nrow, ncol, ntimesteps)
## resolution : 198000, 198000 (x, y)
## active cells : 575
## size : 131.4 Mb
Sparsification reduces memory usage, file size, and processing time for the model.
The state method eliminates all transitions into and out of model
states (locations in space and time) for which the ebirdst distribution
is zero.
Those states are thus removed from the model. It reduces file size the
least of the methods we’ve tried but also preserves the model structure
the most.
Using “state+conditional” will reduce file size substantially more but will also eliminate additional model states (after dead ends are cleaned up).
More on sparsification methods? in the help with
?sparasify()
.
bf <- sparsify(bf, method = "state")
## Evaluating full model performance
## Starting state based sparsification.
## Evaluating post-sparsification performance
## 85.5% zero
## 0.187% density lost
## traverse correlation:
## full: 0.982
## sparse: 0.982
## Warning in bf$metadata$sparse$method <- method: Coercing LHS to a list
# bf <- sparsify(bf, method = "state+conditional", p = 0.995) # This will reduce the file size a lot more
print(bf$metadata$sparse$stats)
## model pct_zero pct_density_lost mean_step_cor min_step_cor traverse_cor
## 1 full 0.000 0.0000000 0.9995108 0.9982212 0.9824825
## 2 sparse 85.487 0.1867287 0.9995116 0.9982210 0.9823185
## mean_distr_cor min_distr_cor
## 1 0.9995813 0.9979229
## 2 0.9995797 0.9979223
Definitions of performance metrics are available with
?evaluate_performance()
Saving the sparse model will save time and disk space if we want to use it later. Here we write it as a serialized R object which is fast and efficient but not at all portable to other software.
sparse_file <- file.path(dir, "sparse.Rds")
saveRDS(bf, file = sparse_file)
In a later session we could then read it with.
bf <- readRDS(sparse_file)
species_info()
and get_metadata()
take a
BirdFlow object as their first argument. An optional second argument
allows specifying a specific item, if omitted a list is returned with
all the available information.
species(bf)
is a shortcut for
species_info(bf, "common_name")
Use ?species_info()
to see descriptions of all the
available information. Dates associated with migration and resident
seasons are likely to be useful.
species(bf)
## [1] "Eastern Kingbird"
species(bf, "scientific")
## [1] "Tyrannus tyrannus"
species_info(bf, "prebreeding_migration_start")
## [1] "2021-03-08"
si <- species_info(bf) # list with all species information
md <- get_metadata(bf) # list with all metadata
get_metadata(bf, "birdflow_model_date") # date model was exported from python
## [1] "2023-03-09 03:24:42.112366"
validate_BirdFlow(bf) # throws error if there are problems
BirdFlow models are based on a raster representation of a time series
of species distributions and contain all the spatial information
necessary to recreate those distributions and to define how the raster
is positioned in space. BirdFlowR uses the terra package to import
raster data and provides
BirdFlow methods for functions defined in the terra package - so that
you can use those functions on BirdFlow objects.
crs()
returns the coordinate reference system - useful
if you need to project other data to match the BirdFlow object.
res()
, xres()
, and yres()
describe the dimensions of individual cells in the model.
ext()
returns a terra extent object.
# Methods for terra functions:
a <- crs(bf) # well known text (long)
crs(bf, proj = TRUE) # proj4 string
## [1] "+proj=laea +lat_0=19.497 +lon_0=-89.678 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(bf)
## [1] 198000 198000
c( xres(bf), yres(bf) ) # same as res(bf)
## [1] 198000 198000
ext(bf)
## SpatExtent : -3054857.14285714, 3875142.85714286, -4752000, 5148000 (xmin, xmax, ymin, ymax)
c(xmin(bf), xmax(bf), ymin(bf), ymax(bf)) # same as ext(bf)
## [1] -3054857 3875143 -4752000 5148000
A distribution in BirdFlow is stored as a vector of values that
correspond to only the active cells (n_active()
) in the
model. Multiple distributions are stored as matrices with
n_active()
rows (a column for each distribution).
We can retreive distributions in this format with
get_distr()
. Use timestep, character dates, date objects,
or “all” to specify which distributions to retrieve.
Retrieve the first distribution and compare its length to the number of active cells.
d <- get_distr(bf, 1) # get first timestep distribution
length(d) # 1 distribution so d is a vector
## [1] 575
n_active(bf) # its length is the the number of active cells in the model
## [1] 575
Get 5 distributions, the result is a matrix in which each column is a distribution with a row for each active cell.
d <- get_distr(bf, 26:30)
dim(d)
## [1] 575 5
head(d, 3)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 0 0 0
## [3,] 0 0 0 0 0
We can also specify distributions with dates, or use “all” to retrieve all the distributions.
d <- get_distr(bf, "2022-12-15") # from character date
d <- get_distr(bf, "all") # all distributions
d <- get_distr(bf, Sys.Date()) # Using a Date object
Use rasterize_distr()
to convert a distribution to a
SpatRaster defined in the terra package. The second argument, the
BirdFlow model, is needed for the spatial information it contains.
d <- get_distr(bf, c(1, 26)) # winter and summer
r <- rasterize_distr(d, bf) # convert to SpatRaster
Alternatively convert directly from BirdFlow to SpatRast with
rast()
the second, optional, argument which
accepts the same inputs a as which
in
get_distr()
.
r <- rast(bf) # all distributions
r <- rast(bf, c(1, 26)) # 1st, and 26th timesteps.
plot(r)
BirdFlowR provides some convenience wrappers to functions in rnaturalearth that load vector data and then crop and reproject it to make it suitable for plotting with BirdFlow output.
r <- rast(bf, species_info(bf, "prebreeding_migration_start"))
plot(r)
coast <- get_coastline(bf) # lines
plot(coast, add = TRUE)
In this section we will sample a single starting location from the winter distribution and project it forward to generate a distribution of predicted breeding grounds for birds that wintered at the starting location.
Set forecast parameters.
start <- 1 # winter
end <- 26 # summer
sample_distr()
will sample from one or more input
distribution to select a single location per distribution. The result is
one or moe distributions with ones in the selected location(s) and zero
elsewhere.
set.seed(0)
d <- get_distr(bf, start)
location <- sample_distr(d)
print( i_to_xy( which( as.logical(location) ), bf) ) # starting coordinates
## x y
## [1,] 3182143 -3663000
Forecast returns the distribution over time as a matrix with one column per timestep.
The plot shows where birds that winter at a particular location are likely to be as the year progresses and ultimately where they might spend their summer. The probability density spreads as the weeks progress.
f <- forecast(bf, location, start, end, "forward")
r <- rasterize_distr(f[, c(1, 7, 14, 19)], bf)
plot(r)
Alternatively we can calculate the difference between the projected distribution for bird’s wintering at that particular location and the distribution of the species as a whole at the same timestep.
projected <- f[ , ncol(f)] # last projected distribution
diff <- projected - get_distr(bf, end)
plot(rasterize_distr(diff, bf))
plot(coast, add = TRUE)
In this section we sample locations from the American Woodcock winter distribution and then generate routes to their summer grounds.
Set route parameters.
n_positions <- 15 # number of starting positions
n_each <- 1 # how many birds to start at each
start <- 1 # starting timestep (winter)
end <- 26 # ending timestep (summer)
First extract the winter distribution, then use
sample_locations()
with n = n_positions
to
sample the input distribution repeatedly resulting in a matrix in which
each column has a single 1 representing the sampled location.
d <- get_distr(bf, start)
locations <- sample_distr(d, n = n_positions)
Collapse the locations down to a vector of the index of each non-zero value and then convert to x and y coordinates.
ind <- apply(locations, 2, function(x) which( as.logical(x) ) )
x <- i_to_x(ind, bf)
y <- i_to_y(ind, bf)
Plot the starting (winter) distribution and sampled locations
winter <- rasterize_distr(d, bf)
plot(winter)
points(x, y)
plot(coast, add = TRUE)
route()
will generate synthetic routes for each starting
position. Currently route()
returns a list with two items
(this may change):
points
a data.frame with a row for each timestep of
each routelines
sf object containing a line for each routerts <- route(bf, x_coord = x, y_coord = y, start = start,
end = end, n = n_each)
head(rts$points, 4)
## x y route timestep date
## 1 2984143 -4059000 1 1 2021-01-04
## 2 2984143 -4059000 1 2 2021-01-11
## 3 2984143 -4059000 1 3 2021-01-18
## 4 1994143 -3069000 1 4 2021-01-25
Plot the route lines over the summer distribution along with points at the starting and ending positions.
d <- get_distr(bf, end)
summer <- rasterize_distr(d, bf)
line_col <- rgb(0, 0, 0, .2)
pt_col <- rgb(0, 0, 0, .5 )
plot(summer)
points( x, y, cex = .4, col = pt_col, pch = 16) # starting points
plot(rts$lines, add = TRUE, col = line_col) # routes
end_pts <- rts$points[rts$points$timestep == end, ] # end points
points(x = end_pts$x, y = end_pts$y,
cex = 0.4, pch = 12, col = pt_col )
plot(coast, add = TRUE)
title(main = species(bf))