The goal of treeco is to make it easy for R users to extract ecosystem and economic benefits of trees. Similar tools like i-Tree, Davey Tree Calculator, and OpenTreeMap are also available and I would encourage you to check them out. These tools heavily influenced treeco.
Note that this package is currently labeled as . Users should expect breaking changes. One example being that eco_run_all
requires both the common name and botanical name fields where it once only required the former field. I’m doing my best to only make these changes when it makes sense and improves treeco significantly. My goal is to eventually change that label to stable. If that hasn’t scared you away, please keeping reading on!
I’m going to use the trees dataset provided in base R:
str(trees)
#> 'data.frame': 31 obs. of 3 variables:
#> $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
#> $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
#> $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
We have 3 variables and 31 observations. The first thing to look at is the variables. We have:
We are missing three important and required bits of information:
Those three fields along with DBH are required to extract eco benefits. Below is an explanation of why:
Given the data we have, we can’t extract the benefits, we’re missing too many fields. Fortunately, there is some info we can use in the docs, type ?trees
in the R console to take a look. We see that these are Black Cherry trees. After some googling, I find that these trees were collected in the Allegheny National Forest in Pennsylvania. I’m going to add the common name as a field common and add a row number field rn. More on why rn is added later.
library(treeco)
library(dplyr)
library(tibble)
trees <- trees %>%
mutate(common = "black cherry tree") %>%
rownames_to_column("rn") %>%
as_tibble() %>%
print()
#> # A tibble: 31 x 5
#> rn Girth Height Volume common
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 8.3 70 10.3 black cherry tree
#> 2 2 8.6 65 10.3 black cherry tree
#> 3 3 8.8 63 10.2 black cherry tree
#> 4 4 10.5 72 16.4 black cherry tree
#> 5 5 10.7 81 18.8 black cherry tree
#> 6 6 10.8 83 19.7 black cherry tree
#> 7 7 11 66 15.6 black cherry tree
#> 8 8 11 75 18.2 black cherry tree
#> 9 9 11.1 80 22.6 black cherry tree
#> 10 10 11.2 75 19.9 black cherry tree
#> # ... with 21 more rows
Now all that’s left is to identify the botanical name for a Black Cherry tree. This is required because all benefits rely on a 3,000+ master species list created by i-Tree.
Since R is very strict, the value “black common tree” will not match i-Tree’s “Black cherry tree” because of the capital “B”. Even worse, i-Tree might call it “Black cherry” and omit the word “tree” which makes the link between the two that much more difficult to identify. The best treeco can do is quantify the similarity between the users data and that master species list and then link the most similar record found in i-Tree. It first does this for the common name field and then the botanical and this is why both fields are required, to maximize the number of matches. This is where eco_guess
plays a role, for example:
x <- c("common fig", "Commn FIG", "RED MAPLE")
eco_guess(x, "botanical")
#> [1] "ficus carica" "ficus carica" "acer rubrum"
And for the trees dataset, I can do something like:
trees <- trees %>%
mutate(botanical = eco_guess(common, "botanical")) %>%
print()
#> # A tibble: 31 x 6
#> rn Girth Height Volume common botanical
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 8.3 70 10.3 black cherry tree prunus serotina
#> 2 2 8.6 65 10.3 black cherry tree prunus serotina
#> 3 3 8.8 63 10.2 black cherry tree prunus serotina
#> 4 4 10.5 72 16.4 black cherry tree prunus serotina
#> 5 5 10.7 81 18.8 black cherry tree prunus serotina
#> 6 6 10.8 83 19.7 black cherry tree prunus serotina
#> 7 7 11 66 15.6 black cherry tree prunus serotina
#> 8 8 11 75 18.2 black cherry tree prunus serotina
#> 9 9 11.1 80 22.6 black cherry tree prunus serotina
#> 10 10 11.2 75 19.9 black cherry tree prunus serotina
#> # ... with 21 more rows
Finally, we need to identify the region code for Pennsylvania. I don’t have a great way of doing this. Adding a function for identifying the region code via zipcode, state, city, etc. is on my list. For now, you can use Davey Tree’s tree benefit calculator to figure out the region and then take a look at the money dataset for the code:
tmoney
#> # A tibble: 144 x 4
#> region region_name conversion value
#> <chr> <chr> <chr> <dbl>
#> 1 CaNCCoJBK Northern California Coast electricity_kwh_to_currency 0.132
#> 2 CenFlaXXX Central Florida electricity_kwh_to_currency 0.132
#> 3 GulfCoCHS Coastal Plain electricity_kwh_to_currency 0.0934
#> 4 InlEmpCLM Inland Empire electricity_kwh_to_currency 0.201
#> 5 InlValMOD Inland Valleys electricity_kwh_to_currency 0.117
#> 6 InterWABQ Interior West electricity_kwh_to_currency 0.0788
#> 7 LoMidWXXX Lower Midwest electricity_kwh_to_currency 0.068
#> 8 MidWstMSP Midwest electricity_kwh_to_currency 0.0759
#> 9 NMtnPrFNL North electricity_kwh_to_currency 0.636
#> 10 NoEastXXX Northeast electricity_kwh_to_currency 0.140
#> # ... with 134 more rows
tmoney %>%
filter(region_name == "Northeast") %>%
distinct(region) %>%
.[[1]]
#> [1] "NoEastXXX"
Before we calculate the benefits, it should be noted that most of the steps above won’t be necessary, they’re only there to construct and describe a dataframe that eco_run_all
needs. In most cases, the typical workflow will be:
my_trees <-
eco_run_all(
data = trees,
common_col = "common",
botanical_col = "botanical",
dbh_col = "Girth",
region = "NoEastXXX"
) %>%
as_tibble() %>%
print()
#> Gathering species matches...
#> Gathering interpolation parameters...
#> Interpolating benefits...
#> # A tibble: 465 x 8
#> botanical common dbh benefit_value benefit unit dollars rn
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr>
#> 1 prunus sero… black c… 8.3 0.277 aq nox a… lb 1.27 1
#> 2 prunus sero… black c… 8.3 0.067 aq nox d… lb 0.31 1
#> 3 prunus sero… black c… 8.3 0.154 aq ozone… lb 0.71 1
#> 4 prunus sero… black c… 8.3 0.0183 aq pm10 … lb 0.15 1
#> 5 prunus sero… black c… 8.3 0.075 aq pm10 … lb 0.62 1
#> 6 prunus sero… black c… 8.3 0.123 aq sox a… lb 0.43 1
#> 7 prunus sero… black c… 8.3 0.0262 aq sox d… lb 0.09 1
#> 8 prunus sero… black c… 8.3 0.0112 aq voc a… lb 0.03 1
#> 9 prunus sero… black c… 8.3 0 bvoc lb 0 1
#> 10 prunus sero… black c… 8.3 84.4 co2 avoi… lb 0.28 1
#> # ... with 455 more rows
Notice that the height and volume fields are missing. This is because eco_run_all
strips the input data of everything except what it needs: the row number, common name, botanical name, and dbh field. It does this in an effort to keep the data small. Not too long ago, eco_run_all
took 2 and half minutes to calculate the benefits for 400,000 trees. It now takes a couple seconds depending on how unique the data is. The removal of unneeded data is why I add a field rn at the beginning, to preserve the row number and link it to the benefits dataset my_trees
:
trees %>%
select(rn, Height, Volume) %>%
right_join(my_trees) %>%
glimpse()
#> Joining, by = "rn"
#> Observations: 465
#> Variables: 10
#> $ rn <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"...
#> $ Height <dbl> 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, ...
#> $ Volume <dbl> 10.3, 10.3, 10.3, 10.3, 10.3, 10.3, 10.3, 10.3, ...
#> $ botanical <chr> "prunus serotina", "prunus serotina", "prunus se...
#> $ common <chr> "black cherry", "black cherry", "black cherry", ...
#> $ dbh <dbl> 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3...
#> $ benefit_value <dbl> 0.2773, 0.0670, 0.1541, 0.0183, 0.0750, 0.1232, ...
#> $ benefit <chr> "aq nox avoided", "aq nox dep", "aq ozone dep", ...
#> $ unit <chr> "lb", "lb", "lb", "lb", "lb", "lb", "lb", "lb", ...
#> $ dollars <dbl> 1.27, 0.31, 0.71, 0.15, 0.62, 0.43, 0.09, 0.03, ...
This is especially useful given that most tree data is spatial and includes coordinates. Whether or not the approach (stripping data, then joining at the end) is a good idea is certainly up for debate and is another reminder of why this package is experimental.
I have a couple ideas for the future of treeco:
eco_run_all
to tell the user how many records were matched.sf
package for mapping and other applications like guessing the users region.expanded
to include additional benefits.Any criticism, issues, enhancements are encouraged and can be filed here.