phylogenetic analyses

We use a wrapper function to run phylogenetic comparative analyses. It

chemical_bloom_tree <- buildTree(
  scaffold_type = "newick",
  scaffold_in_path = "http://thebustalab.github.io/data/angiosperms.newick",
  members = unique(chemical_blooms$label)
)
## Scaffold newick tip Sabal_pumos substituted with Sabal_palmetto
## Scaffold newick tip Iris_lazica substituted with Iris_sp
## Scaffold newick tip Iris_lacustris substituted with Iris_germanica
## Scaffold newick tip Allium_textile substituted with Allium_sp
## Scaffold newick tip Allium_subhirsutum substituted with Allium_brevistylum
## Scaffold newick tip Ornithogalum_saundersiae substituted with Ornithogalum_candicans
## Scaffold newick tip Hosta_plantaginea substituted with Hosta_sp
## Scaffold newick tip Agave_striata substituted with Agave_cerulata
## Scaffold newick tip Agave_tequilana substituted with Agave_chrysantha
## Scaffold newick tip Aristolochia_serpentaria substituted with Aristolochia_labiata
## Scaffold newick tip Thalictrum_clavatum substituted with Thalictrum_rochebrunianum
## Scaffold newick tip Delphinium_pictum substituted with Delphinium_elatum
## Scaffold newick tip Ferocactus_recurvus substituted with Ferocactus_wislizeni
## Scaffold newick tip Opuntia_articulata substituted with Opuntia_sp
## Scaffold newick tip Opuntia_quimilo substituted with Opuntia_robusta
## Scaffold newick tip Cylindropuntia_fulgida substituted with Cylindropuntia_bigelovii
## Scaffold newick tip Cylindropuntia_prolifera substituted with Cylindropuntia_versicolor
## Scaffold newick tip Cylindropuntia_echinocarpa substituted with Cylindropuntia_ramosissima
## Scaffold newick tip Kirengeshoma_palmata substituted with Kirengeshoma_Palmata
## Scaffold newick tip Lavandula_bipinnata substituted with Lavandula_sp
## Scaffold newick tip Rudbeckia_hirta substituted with Rudbeckia_occidentalis
## Scaffold newick tip Crassula_campestris substituted with Crassula_ovata
## Scaffold newick tip Crassula_tillaea substituted with Crassula_deceptor
## Scaffold newick tip Crassula_alata substituted with Crassula_arborescens
## Scaffold newick tip Crassula_colligata substituted with Crassula_perfoliata
## Scaffold newick tip Hylotelephium_erythrostictum substituted with Hylotelephium_sp
## Scaffold newick tip Echeveria_setosa substituted with Echeveria_pulidonis
## Scaffold newick tip Bryophyllum_pinnatum substituted with Bryophyllum_fedtschenkoi
## Scaffold newick tip Kalanchoe_linearifolia substituted with Kalanchoe_marginata
## Scaffold newick tip Kalanchoe_tomentosa substituted with Kalanchoe_luciae
## Scaffold newick tip Kalanchoe_beharensis substituted with Kalanchoe_thrysiflora
## Scaffold newick tip Iliamna_latibracteata substituted with Iliamna_rivularis
## Scaffold newick tip Euphorbia_lathyris substituted with Euphorbia_resinifera
## Scaffold newick tip Quercus_valdinervosa substituted with Quercus_muehlenbergii
## Scaffold newick tip Rubus_repens substituted with Rubus_sp
## Pro tip: most tree read/write functions reset node numbers.
## Fortify your tree and save it as a csv file to preserve node numbering.
## Do not save your tree as a newick or nexus file.

runPhylogeneticAnalyses(
    traits = pivot_longer(chemical_blooms[,1:4], cols = c(3:4), names_to = "trait", values_to = "value"),
    column_w_names_of_tiplabels = "label",
    column_w_names_of_traits = "trait",
    column_w_values_for_traits = "value",
    tree = chemical_bloom_tree
)
## Joining with `by = join_by(trait)`
## # A tibble: 310 × 17
##    parent  node branch.length label isTip     x     y branch
##     <int> <dbl>         <dbl> <chr> <lgl> <dbl> <dbl>  <dbl>
##  1     80     1         290.  Gink… TRUE   352.     1   207.
##  2     80     1         290.  Gink… TRUE   352.     1   207.
##  3     81     2         267.  Pice… TRUE   352.     2   219.
##  4     81     2         267.  Pice… TRUE   352.     2   219.
##  5     81     3         267.  Cupr… TRUE   352.     3   219.
##  6     81     3         267.  Cupr… TRUE   352.     3   219.
##  7     84     4         135.  Eryt… TRUE   352.     5   285.
##  8     84     4         135.  Eryt… TRUE   352.     5   285.
##  9     86     5          16.2 Iris… TRUE   352.     6   344.
## 10     86     5          16.2 Iris… TRUE   352.     6   344.
## # ℹ 300 more rows
## # ℹ 9 more variables: angle <dbl>, trait <chr>,
## #   value <dbl>, trait_type <chr>,
## #   phylogenetic_signal_k_value <dbl>,
## #   phylogenetic_signal_k_p_value <dbl>,
## #   phylogenetic_signal_lambda_value <dbl>,
## #   phylogenetic_signal_lambda_p_value <dbl>, pic <dbl>

For all the below, there are some structural requirements: (i) the tree needs to be a phylo object (ii) the traits need to be a data.frame in which each row is a species and each column is a variable, and (iii) the first column in the data.frame needs to be the names of the species and they must exactly match the tip labels of the tree (though they don’t have to be in the same order), for example:

chemical_bloom_tree <- buildTree(
  scaffold_type = "newick",
  scaffold_in_path = "http://thebustalab.github.io/data/angiosperms.newick",
  members = unique(chemical_blooms$label)
)
## Scaffold newick tip Sabal_pumos substituted with Sabal_palmetto
## Scaffold newick tip Iris_lazica substituted with Iris_sp
## Scaffold newick tip Iris_lacustris substituted with Iris_germanica
## Scaffold newick tip Allium_textile substituted with Allium_sp
## Scaffold newick tip Allium_subhirsutum substituted with Allium_brevistylum
## Scaffold newick tip Ornithogalum_saundersiae substituted with Ornithogalum_candicans
## Scaffold newick tip Hosta_plantaginea substituted with Hosta_sp
## Scaffold newick tip Agave_striata substituted with Agave_cerulata
## Scaffold newick tip Agave_tequilana substituted with Agave_chrysantha
## Scaffold newick tip Aristolochia_serpentaria substituted with Aristolochia_labiata
## Scaffold newick tip Thalictrum_clavatum substituted with Thalictrum_rochebrunianum
## Scaffold newick tip Delphinium_pictum substituted with Delphinium_elatum
## Scaffold newick tip Ferocactus_recurvus substituted with Ferocactus_wislizeni
## Scaffold newick tip Opuntia_articulata substituted with Opuntia_sp
## Scaffold newick tip Opuntia_quimilo substituted with Opuntia_robusta
## Scaffold newick tip Cylindropuntia_fulgida substituted with Cylindropuntia_bigelovii
## Scaffold newick tip Cylindropuntia_prolifera substituted with Cylindropuntia_versicolor
## Scaffold newick tip Cylindropuntia_echinocarpa substituted with Cylindropuntia_ramosissima
## Scaffold newick tip Kirengeshoma_palmata substituted with Kirengeshoma_Palmata
## Scaffold newick tip Lavandula_bipinnata substituted with Lavandula_sp
## Scaffold newick tip Rudbeckia_hirta substituted with Rudbeckia_occidentalis
## Scaffold newick tip Crassula_campestris substituted with Crassula_ovata
## Scaffold newick tip Crassula_tillaea substituted with Crassula_deceptor
## Scaffold newick tip Crassula_alata substituted with Crassula_arborescens
## Scaffold newick tip Crassula_colligata substituted with Crassula_perfoliata
## Scaffold newick tip Hylotelephium_erythrostictum substituted with Hylotelephium_sp
## Scaffold newick tip Echeveria_setosa substituted with Echeveria_pulidonis
## Scaffold newick tip Bryophyllum_pinnatum substituted with Bryophyllum_fedtschenkoi
## Scaffold newick tip Kalanchoe_linearifolia substituted with Kalanchoe_marginata
## Scaffold newick tip Kalanchoe_tomentosa substituted with Kalanchoe_luciae
## Scaffold newick tip Kalanchoe_beharensis substituted with Kalanchoe_thrysiflora
## Scaffold newick tip Iliamna_latibracteata substituted with Iliamna_rivularis
## Scaffold newick tip Euphorbia_lathyris substituted with Euphorbia_resinifera
## Scaffold newick tip Quercus_valdinervosa substituted with Quercus_muehlenbergii
## Scaffold newick tip Rubus_repens substituted with Rubus_sp
## Pro tip: most tree read/write functions reset node numbers.
## Fortify your tree and save it as a csv file to preserve node numbering.
## Do not save your tree as a newick or nexus file.

phylogeneticSignal

Phylogenetic signal is a measure of the degree to which related species share similar trait values. It is used to determine whether a trait has evolved in a manner that is consistent with the species’ evolutionary history. phylochemistry provides the phylogeneticSignal function, which can be used to calculate phylogenetic signal for a given set of traits and a phylogenetic tree. Here is an example:

phylogeneticSignal(
  traits = pivot_longer(chemical_blooms, cols = c(2:10), names_to = "compound", values_to = "value"),
  column_w_names_of_tiplabels = "label",
  column_w_names_of_traits = "compound",
  column_w_values_for_traits = "value",
  tree = chemical_bloom_tree
)
##             trait trait_type n_species number_of_levels
## 1         Alkanes continuous        78               NA
## 2    Sec_Alcohols continuous        78               NA
## 3          Others continuous        78               NA
## 4     Fatty_acids continuous        78               NA
## 5        Alcohols continuous        78               NA
## 6   Triterpenoids continuous        78               NA
## 7         Ketones continuous        78               NA
## 8 Other_compounds continuous        78               NA
## 9       Aldehydes continuous        78               NA
##   evolutionary_transitions_observed
## 1                                NA
## 2                                NA
## 3                                NA
## 4                                NA
## 5                                NA
## 6                                NA
## 7                                NA
## 8                                NA
## 9                                NA
##   median_evolutionary_transitions_in_randomization
## 1                                               NA
## 2                                               NA
## 3                                               NA
## 4                                               NA
## 5                                               NA
## 6                                               NA
## 7                                               NA
## 8                                               NA
## 9                                               NA
##   minimum_evolutionary_transitions_in_randomization
## 1                                                NA
## 2                                                NA
## 3                                                NA
## 4                                                NA
## 5                                                NA
## 6                                                NA
## 7                                                NA
## 8                                                NA
## 9                                                NA
##   evolutionary_transitions_in_randomization
## 1                                        NA
## 2                                        NA
## 3                                        NA
## 4                                        NA
## 5                                        NA
## 6                                        NA
## 7                                        NA
## 8                                        NA
## 9                                        NA
##   phylogenetic_signal_k_value phylogenetic_signal_k_p_value
## 1                 0.066016688                         0.167
## 2                 2.045611108                         0.001
## 3                 0.029719595                         0.711
## 4                 0.069056092                         0.292
## 5                 0.053761730                         0.354
## 6                 0.566806846                         0.001
## 7                 0.239488396                         0.030
## 8                 0.018420367                         0.868
## 9                 0.008613879                         0.928
##   phylogenetic_signal_lambda_value
## 1                           0.0001
## 2                           0.9999
## 3                           0.0001
## 4                           0.0001
## 5                           0.0001
## 6                           0.9999
## 7                           0.7874
## 8                           0.0001
## 9                           0.0001
##   phylogenetic_signal_lambda_p_value
## 1                              1.000
## 2                              0.000
## 3                              1.000
## 4                              1.000
## 5                              1.000
## 6                              0.000
## 7                              0.045
## 8                              1.000
## 9                              1.000

independentContrasts

Phylogenetic independent contrasts are a method for analyzing the relationship between two or more traits while taking into account the evolutionary history of the species being studied. This method involves transforming the data in to “independent contrasts” to remove the effects of shared ancestry, allowing for more accurate analysis of the relationship between traits. phylochemistry provides the independentContrasts function to calculate phylogenetic independent contrasts for a given set of traits and a phylogenetic tree. Here is an example of calculating independent contrasts for an example dataset, followed by generating a linear model based on the contrasts.

contrasts <- independentContrasts(
  traits = pivot_longer(chemical_blooms, cols = c(2:10), names_to = "compound", values_to = "value"),
  column_w_names_of_tiplabels = "label",
  column_w_names_of_traits = "compound",
  column_w_values_for_traits = "value",
  tree = chemical_bloom_tree
)

# buildLinearModel(
#   data = contrasts,
#   formula = "Fatty_acids = Alkanes + 0"
# ) -> model
# 
# ggplot(model$data) +
#   geom_point(aes(x = input_x, y = input_y)) +
#   geom_line(aes(x = model_x, model_y))

ancestralTraits

Ancestral trait reconstruction is a method to infer the characteristics (or “traits”) of ancestral organisms based on the traits of their modern descendants. By examining the traits of present-day species and using phylogenetic trees, we can estimate or “reconstruct” the traits of common ancestors. This method can be applied to various types of traits, including continuously varying and discrete traits. Ancestral trait reconstruction helps us gain insights into the evolutionary processes and the historical transitions that led to current biodiversity. phylochemistry provides the function ancestralTraits to perform these operations. Note that ancestralTraits is different from buildTrees “ancestral_states”. “ancestral_states” estimates ancestral sequence states at phylogeny nodes, while ancestralTraits will estimate the traits of an ancestor, given the traits of extant species that are present on the leaves of a phylogeny. Here is an example. Please note that ancestralTraits accepts data in a long-style data frame.

anc_traits_tree <- ancestralTraits(
  traits = pivot_longer(chemical_blooms, cols = -1),
  column_w_names_of_tiplabels = "label",
  column_w_names_of_traits = "name",
  column_w_values_for_traits = "value",
  tree = chemical_bloom_tree
)
head(anc_traits_tree)
## # A tibble: 6 × 11
##   parent  node branch.length label  isTip     x     y branch
##    <int> <dbl>         <dbl> <chr>  <lgl> <dbl> <dbl>  <dbl>
## 1     80     1          290. Ginkg… TRUE   352.     1   207.
## 2     80     1          290. Ginkg… TRUE   352.     1   207.
## 3     80     1          290. Ginkg… TRUE   352.     1   207.
## 4     80     1          290. Ginkg… TRUE   352.     1   207.
## 5     80     1          290. Ginkg… TRUE   352.     1   207.
## 6     80     1          290. Ginkg… TRUE   352.     1   207.
## # ℹ 3 more variables: angle <dbl>, trait <chr>, value <dbl>

In addition to providing ancestral state estimations, there is also a function for plotting those estimations on a phylogeny: geom_ancestral_pie. Here is an example. Note that cols is a vector of column numbers that correspond to the traits of interest. pie_size is the size of the pie chart that will be plotted at each node. geom_ancestral_pie relies on having columns in its input called trait and value, such as those output by ancestralTraits. Note that if you are passing an object to ggtree() that has duplicate node names, you will need to use the distinct function to remove the duplicates, otherwise geom_ancestral_pie will get confused about where to place the pies.

ggtree(
  distinct(anc_traits_tree, node, .keep_all = TRUE)
) +
  geom_ancestral_pie(
    data = filter(anc_traits_tree, isTip == FALSE),
    pie_size = 0.1, pie_alpha = 1
  ) +
  geom_tiplab(offset = 20, align = TRUE) +
  scale_x_continuous(limits = c(0,650)) +
  theme_void()