phylogenetic analyses

We use a wrapper function to run phylogenetic comparative analyses. It

chemical_bloom_tree <- buildTree(
  scaffold_type = "newick",
  scaffold_in_path = "http://thebustalab.github.io/data/angiosperms.newick",
  members = unique(chemical_blooms$label)
)
## IMPORTANT: Some species substitutions or removals were made as part of buildTree. Run build_tree_substitutions() to see them all.
## Pro tip: most tree read/write functions reset node numbers. Fortify your tree and save it as a csv file to preserve node numbering.

runPhylogeneticAnalyses(
    traits = pivot_longer(chemical_blooms[,1:4], cols = c(3:4), names_to = "trait", values_to = "value"),
    column_w_names_of_tiplabels = "label",
    column_w_names_of_traits = "trait",
    column_w_values_for_traits = "value",
    tree = chemical_bloom_tree
)
## Joining with `by = join_by(trait)`
## # A tibble: 310 × 17
##    parent  node branch.length label isTip     x     y branch
##     <int> <dbl>         <dbl> <chr> <lgl> <dbl> <dbl>  <dbl>
##  1     80     1         290.  Gink… TRUE   352.     1   207.
##  2     80     1         290.  Gink… TRUE   352.     1   207.
##  3     81     2         267.  Pice… TRUE   352.     2   219.
##  4     81     2         267.  Pice… TRUE   352.     2   219.
##  5     81     3         267.  Cupr… TRUE   352.     3   219.
##  6     81     3         267.  Cupr… TRUE   352.     3   219.
##  7     84     4         135.  Eryt… TRUE   352.     5   285.
##  8     84     4         135.  Eryt… TRUE   352.     5   285.
##  9     86     5          16.2 Iris… TRUE   352.     6   344.
## 10     86     5          16.2 Iris… TRUE   352.     6   344.
## # ℹ 300 more rows
## # ℹ 9 more variables: angle <dbl>, trait <chr>,
## #   value <dbl>, trait_type <chr>,
## #   phylogenetic_signal_k_value <dbl>,
## #   phylogenetic_signal_k_p_value <dbl>,
## #   phylogenetic_signal_lambda_value <dbl>,
## #   phylogenetic_signal_lambda_p_value <dbl>, pic <dbl>

For all the below, there are some structural requirements: (i) the tree needs to be a phylo object (ii) the traits need to be a data.frame in which each row is a species and each column is a variable, and (iii) the first column in the data.frame needs to be the names of the species and they must exactly match the tip labels of the tree (though they don’t have to be in the same order), for example:

chemical_bloom_tree <- buildTree(
  scaffold_type = "newick",
  scaffold_in_path = "http://thebustalab.github.io/data/angiosperms.newick",
  members = unique(chemical_blooms$label)
)
## IMPORTANT: Some species substitutions or removals were made as part of buildTree. Run build_tree_substitutions() to see them all.
## Pro tip: most tree read/write functions reset node numbers. Fortify your tree and save it as a csv file to preserve node numbering.

phylogeneticSignal

Phylogenetic signal is a measure of the degree to which related species share similar trait values. It is used to determine whether a trait has evolved in a manner that is consistent with the species’ evolutionary history. phylochemistry provides the phylogeneticSignal function, which can be used to calculate phylogenetic signal for a given set of traits and a phylogenetic tree. Here is an example:

phylogeneticSignal(
  traits = pivot_longer(chemical_blooms, cols = c(2:10), names_to = "compound", values_to = "value"),
  column_w_names_of_tiplabels = "label",
  column_w_names_of_traits = "compound",
  column_w_values_for_traits = "value",
  tree = chemical_bloom_tree
)
##             trait trait_type n_species number_of_levels
## 1         Alkanes continuous        78               NA
## 2    Sec_Alcohols continuous        78               NA
## 3          Others continuous        78               NA
## 4     Fatty_acids continuous        78               NA
## 5        Alcohols continuous        78               NA
## 6   Triterpenoids continuous        78               NA
## 7         Ketones continuous        78               NA
## 8 Other_compounds continuous        78               NA
## 9       Aldehydes continuous        78               NA
##   evolutionary_transitions_observed
## 1                                NA
## 2                                NA
## 3                                NA
## 4                                NA
## 5                                NA
## 6                                NA
## 7                                NA
## 8                                NA
## 9                                NA
##   median_evolutionary_transitions_in_randomization
## 1                                               NA
## 2                                               NA
## 3                                               NA
## 4                                               NA
## 5                                               NA
## 6                                               NA
## 7                                               NA
## 8                                               NA
## 9                                               NA
##   minimum_evolutionary_transitions_in_randomization
## 1                                                NA
## 2                                                NA
## 3                                                NA
## 4                                                NA
## 5                                                NA
## 6                                                NA
## 7                                                NA
## 8                                                NA
## 9                                                NA
##   evolutionary_transitions_in_randomization
## 1                                        NA
## 2                                        NA
## 3                                        NA
## 4                                        NA
## 5                                        NA
## 6                                        NA
## 7                                        NA
## 8                                        NA
## 9                                        NA
##   phylogenetic_signal_k_value phylogenetic_signal_k_p_value
## 1                 0.066016688                         0.167
## 2                 2.045611108                         0.001
## 3                 0.029719595                         0.711
## 4                 0.069056092                         0.292
## 5                 0.053761730                         0.354
## 6                 0.566806846                         0.001
## 7                 0.239488396                         0.030
## 8                 0.018420367                         0.868
## 9                 0.008613879                         0.928
##   phylogenetic_signal_lambda_value
## 1                           0.0001
## 2                           0.9999
## 3                           0.0001
## 4                           0.0001
## 5                           0.0001
## 6                           0.9999
## 7                           0.7874
## 8                           0.0001
## 9                           0.0001
##   phylogenetic_signal_lambda_p_value
## 1                              1.000
## 2                              0.000
## 3                              1.000
## 4                              1.000
## 5                              1.000
## 6                              0.000
## 7                              0.045
## 8                              1.000
## 9                              1.000

independentContrasts

Phylogenetic independent contrasts are a method for analyzing the relationship between two or more traits while taking into account the evolutionary history of the species being studied. This method involves transforming the data in to “independent contrasts” to remove the effects of shared ancestry, allowing for more accurate analysis of the relationship between traits. phylochemistry provides the independentContrasts function to calculate phylogenetic independent contrasts for a given set of traits and a phylogenetic tree. Here is an example of calculating independent contrasts for an example dataset, followed by generating a linear model based on the contrasts.

contrasts <- independentContrasts(
  traits = pivot_longer(chemical_blooms, cols = c(2:10), names_to = "compound", values_to = "value"),
  column_w_names_of_tiplabels = "label",
  column_w_names_of_traits = "compound",
  column_w_values_for_traits = "value",
  tree = chemical_bloom_tree
)

# buildLinearModel(
#   data = contrasts,
#   formula = "Fatty_acids = Alkanes + 0"
# ) -> model
# 
# ggplot(model$data) +
#   geom_point(aes(x = input_x, y = input_y)) +
#   geom_line(aes(x = model_x, model_y))

ancestralTraits

Ancestral trait reconstruction is a method to infer the characteristics (or “traits”) of ancestral organisms based on the traits of their modern descendants. By examining the traits of present-day species and using phylogenetic trees, we can estimate or “reconstruct” the traits of common ancestors. This method can be applied to various types of traits, including continuously varying and discrete traits. Ancestral trait reconstruction helps us gain insights into the evolutionary processes and the historical transitions that led to current biodiversity. phylochemistry provides the function ancestralTraits to perform these operations. Note that ancestralTraits is different from buildTrees “ancestral_states”. “ancestral_states” estimates ancestral sequence states at phylogeny nodes, while ancestralTraits will estimate the traits of an ancestor, given the traits of extant species that are present on the leaves of a phylogeny. Here is an example. Please note that ancestralTraits accepts data in a long-style data frame.

anc_traits_tree <- ancestralTraits(
  traits = pivot_longer(chemical_blooms, cols = -1),
  column_w_names_of_tiplabels = "label",
  column_w_names_of_traits = "name",
  column_w_values_for_traits = "value",
  tree = chemical_bloom_tree
)
head(anc_traits_tree)
## # A tibble: 6 × 11
##   parent  node branch.length label  isTip     x     y branch
##    <int> <dbl>         <dbl> <chr>  <lgl> <dbl> <dbl>  <dbl>
## 1     80     1          290. Ginkg… TRUE   352.     1   207.
## 2     80     1          290. Ginkg… TRUE   352.     1   207.
## 3     80     1          290. Ginkg… TRUE   352.     1   207.
## 4     80     1          290. Ginkg… TRUE   352.     1   207.
## 5     80     1          290. Ginkg… TRUE   352.     1   207.
## 6     80     1          290. Ginkg… TRUE   352.     1   207.
## # ℹ 3 more variables: angle <dbl>, trait <chr>, value <dbl>

In addition to providing ancestral state estimations, there is also a function for plotting those estimations on a phylogeny: geom_ancestral_pie. Here is an example. Note that cols is a vector of column numbers that correspond to the traits of interest. pie_size is the size of the pie chart that will be plotted at each node. geom_ancestral_pie relies on having columns in its input called trait and value, such as those output by ancestralTraits. Note that if you are passing an object to ggtree() that has duplicate node names, you will need to use the distinct function to remove the duplicates, otherwise geom_ancestral_pie will get confused about where to place the pies.

ggtree(
  distinct(anc_traits_tree, node, .keep_all = TRUE)
) +
  geom_ancestral_pie(
    data = filter(anc_traits_tree, isTip == FALSE),
    pie_size = 0.1, pie_alpha = 1
  ) +
  geom_tiplab(offset = 20, align = TRUE) +
  scale_x_continuous(limits = c(0,650)) +
  theme_void()