Introduction to treeco

The goal of treeco is to make it easy for R users to extract ecosystem and economic benefits of trees. Similar tools like i-Tree, Davey Tree Calculator, and OpenTreeMap are also available and I would encourage you to check them out. These tools heavily influenced treeco.

Note that this package is currently labeled as . Users should expect breaking changes. One example being that eco_run_all requires both the common name and botanical name fields where it once only required the former field. I’m doing my best to only make these changes when it makes sense and improves treeco significantly. My goal is to eventually change that label to stable. If that hasn’t scared you away, please keeping reading on!

Getting started

I’m going to use the trees dataset provided in base R:

str(trees)
#> 'data.frame':    31 obs. of  3 variables:
#>  $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
#>  $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
#>  $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

We have 3 variables and 31 observations. The first thing to look at is the variables. We have:

Girth: This is the trees diameter in inches, often referred to as Diameter at breast height or DBH for short.
Height: The height of the tree in feet.
Volume: The volume of timber in cubic feet.

We are missing three important and required bits of information:

Common name: The common name given to the tree.
Botanical name: The botanical or scientific name given to the tree.
Region: The region these trees belong to.

Those three fields along with DBH are required to extract eco benefits. Below is an explanation of why:

DBH: This value is part of the interpolation equation used to interpolate benefits.
Common name: The benefits of a tree differ depending on the species.
Botanical name: Same reasoning as mentioned in common name but there is an additional layer. The reason treeco requires both of these fields is to maximize the number of records treeco can match. More on this later.
Region: The benefits of an identical tree in two different regions will often differ. For example, a palm tree in California might not be as valuable as a palm tree on the east coast.

Guessing a missing name field

Given the data we have, we can’t extract the benefits, we’re missing too many fields. Fortunately, there is some info we can use in the docs, type ?trees in the R console to take a look. We see that these are Black Cherry trees. After some googling, I find that these trees were collected in the Allegheny National Forest in Pennsylvania. I’m going to add the common name as a field common and add a row number field rn. More on why rn is added later.

library(treeco)
library(dplyr)
library(tibble)

trees <- trees %>% 
  mutate(common = "black cherry tree") %>% 
  rownames_to_column("rn") %>% 
  as_tibble() %>% 
  print()
#> # A tibble: 31 x 5
#>    rn    Girth Height Volume common           
#>    <chr> <dbl>  <dbl>  <dbl> <chr>            
#>  1 1       8.3     70   10.3 black cherry tree
#>  2 2       8.6     65   10.3 black cherry tree
#>  3 3       8.8     63   10.2 black cherry tree
#>  4 4      10.5     72   16.4 black cherry tree
#>  5 5      10.7     81   18.8 black cherry tree
#>  6 6      10.8     83   19.7 black cherry tree
#>  7 7      11       66   15.6 black cherry tree
#>  8 8      11       75   18.2 black cherry tree
#>  9 9      11.1     80   22.6 black cherry tree
#> 10 10     11.2     75   19.9 black cherry tree
#> # ... with 21 more rows

Now all that’s left is to identify the botanical name for a Black Cherry tree. This is required because all benefits rely on a 3,000+ master species list created by i-Tree.

Since R is very strict, the value “black common tree” will not match i-Tree’s “Black cherry tree” because of the capital “B”. Even worse, i-Tree might call it “Black cherry” and omit the word “tree” which makes the link between the two that much more difficult to identify. The best treeco can do is quantify the similarity between the users data and that master species list and then link the most similar record found in i-Tree. It first does this for the common name field and then the botanical and this is why both fields are required, to maximize the number of matches. This is where eco_guess plays a role, for example:

x <- c("common fig", "Commn FIG", "RED MAPLE")
eco_guess(x, "botanical")
#> [1] "ficus carica" "ficus carica" "acer rubrum"

And for the trees dataset, I can do something like:

trees <- trees %>% 
  mutate(botanical = eco_guess(common, "botanical")) %>% 
  print()
#> # A tibble: 31 x 6
#>    rn    Girth Height Volume common            botanical      
#>    <chr> <dbl>  <dbl>  <dbl> <chr>             <chr>          
#>  1 1       8.3     70   10.3 black cherry tree prunus serotina
#>  2 2       8.6     65   10.3 black cherry tree prunus serotina
#>  3 3       8.8     63   10.2 black cherry tree prunus serotina
#>  4 4      10.5     72   16.4 black cherry tree prunus serotina
#>  5 5      10.7     81   18.8 black cherry tree prunus serotina
#>  6 6      10.8     83   19.7 black cherry tree prunus serotina
#>  7 7      11       66   15.6 black cherry tree prunus serotina
#>  8 8      11       75   18.2 black cherry tree prunus serotina
#>  9 9      11.1     80   22.6 black cherry tree prunus serotina
#> 10 10     11.2     75   19.9 black cherry tree prunus serotina
#> # ... with 21 more rows

Finally, we need to identify the region code for Pennsylvania. I don’t have a great way of doing this. Adding a function for identifying the region code via zipcode, state, city, etc. is on my list. For now, you can use Davey Tree’s tree benefit calculator to figure out the region and then take a look at the money dataset for the code:

tmoney
#> # A tibble: 144 x 4
#>    region    region_name               conversion                   value
#>    <chr>     <chr>                     <chr>                        <dbl>
#>  1 CaNCCoJBK Northern California Coast electricity_kwh_to_currency 0.132 
#>  2 CenFlaXXX Central Florida           electricity_kwh_to_currency 0.132 
#>  3 GulfCoCHS Coastal Plain             electricity_kwh_to_currency 0.0934
#>  4 InlEmpCLM Inland Empire             electricity_kwh_to_currency 0.201 
#>  5 InlValMOD Inland Valleys            electricity_kwh_to_currency 0.117 
#>  6 InterWABQ Interior West             electricity_kwh_to_currency 0.0788
#>  7 LoMidWXXX Lower Midwest             electricity_kwh_to_currency 0.068 
#>  8 MidWstMSP Midwest                   electricity_kwh_to_currency 0.0759
#>  9 NMtnPrFNL North                     electricity_kwh_to_currency 0.636 
#> 10 NoEastXXX Northeast                 electricity_kwh_to_currency 0.140 
#> # ... with 134 more rows

tmoney %>% 
  filter(region_name == "Northeast") %>% 
  distinct(region) %>% 
  .[[1]]
#> [1] "NoEastXXX"

Calculating the benefits

Before we calculate the benefits, it should be noted that most of the steps above won’t be necessary, they’re only there to construct and describe a dataframe that eco_run_all needs. In most cases, the typical workflow will be:

Import the data
Guess the common/botanical field if either is missing
Calculate the benefits

my_trees <- 
  eco_run_all(
    data = trees, 
    common_col = "common", 
    botanical_col = "botanical", 
    dbh_col = "Girth", 
    region = "NoEastXXX"
  ) %>% 
  as_tibble() %>% 
  print()
#> Gathering species matches...
#> Gathering interpolation parameters...
#> Interpolating benefits...
#> # A tibble: 465 x 8
#>    botanical    common     dbh benefit_value benefit   unit  dollars rn   
#>    <chr>        <chr>    <dbl>         <dbl> <chr>     <chr>   <dbl> <chr>
#>  1 prunus sero… black c…   8.3        0.277  aq nox a… lb       1.27 1    
#>  2 prunus sero… black c…   8.3        0.067  aq nox d… lb       0.31 1    
#>  3 prunus sero… black c…   8.3        0.154  aq ozone… lb       0.71 1    
#>  4 prunus sero… black c…   8.3        0.0183 aq pm10 … lb       0.15 1    
#>  5 prunus sero… black c…   8.3        0.075  aq pm10 … lb       0.62 1    
#>  6 prunus sero… black c…   8.3        0.123  aq sox a… lb       0.43 1    
#>  7 prunus sero… black c…   8.3        0.0262 aq sox d… lb       0.09 1    
#>  8 prunus sero… black c…   8.3        0.0112 aq voc a… lb       0.03 1    
#>  9 prunus sero… black c…   8.3        0      bvoc      lb       0    1    
#> 10 prunus sero… black c…   8.3       84.4    co2 avoi… lb       0.28 1    
#> # ... with 455 more rows

Notice that the height and volume fields are missing. This is because eco_run_all strips the input data of everything except what it needs: the row number, common name, botanical name, and dbh field. It does this in an effort to keep the data small. Not too long ago, eco_run_all took 2 and half minutes to calculate the benefits for 400,000 trees. It now takes a couple seconds depending on how unique the data is. The removal of unneeded data is why I add a field rn at the beginning, to preserve the row number and link it to the benefits dataset my_trees:

trees %>% 
  select(rn, Height, Volume) %>% 
  right_join(my_trees) %>% 
  glimpse()
#> Joining, by = "rn"
#> Observations: 465
#> Variables: 10
#> $ rn            <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"...
#> $ Height        <dbl> 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, ...
#> $ Volume        <dbl> 10.3, 10.3, 10.3, 10.3, 10.3, 10.3, 10.3, 10.3, ...
#> $ botanical     <chr> "prunus serotina", "prunus serotina", "prunus se...
#> $ common        <chr> "black cherry", "black cherry", "black cherry", ...
#> $ dbh           <dbl> 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3...
#> $ benefit_value <dbl> 0.2773, 0.0670, 0.1541, 0.0183, 0.0750, 0.1232, ...
#> $ benefit       <chr> "aq nox avoided", "aq nox dep", "aq ozone dep", ...
#> $ unit          <chr> "lb", "lb", "lb", "lb", "lb", "lb", "lb", "lb", ...
#> $ dollars       <dbl> 1.27, 0.31, 0.71, 0.15, 0.62, 0.43, 0.09, 0.03, ...

This is especially useful given that most tree data is spatial and includes coordinates. Whether or not the approach (stripping data, then joining at the end) is a good idea is certainly up for debate and is another reminder of why this package is experimental.

Future plans

I have a couple ideas for the future of treeco:

Add reporting features similar to i-Tree by utilizing Rmarkdown.
A more verbose eco_run_all to tell the user how many records were matched.
Utilize the sf package for mapping and other applications like guessing the users region.
Add additional benefits. I’m leaning towards adding an argument expanded to include additional benefits.
Warn the user if the benefits will take more time than expected to calculate the benefits.

Any criticism, issues, enhancements are encouraged and can be filed here.

Tyler Littlefield

2018-12-08

Getting started

Guessing a missing name field

Calculating the benefits

Future plans

Contents