research program and aims

Research

Plants are rooted in the ground and cannot run away from challenges — instead, they use complex metabolic networks to generate massive arrays of bioactive chemicals and natural biocomposites. With these they colonize deserts, transform Earth's atmosphere, and live for thousands of years. Organic chemists have determined the structures of many thousands of plant chemicals, but biosynthetic pathways to only a handful are known. In the Busta lab, we are closing this gap by uniting analytical chemistry with DNA and RNA sequence data and language model ('artificial intelligence') technologies. Three questions drive us:

  • How can we use emerging language models to broadly advance phytochemical research?
  • What is the genetic basis for chemical diversity — how do biochemical pathways and biocomposites evolve?
  • How can we create knowledge of plant surface biomaterials and their biosynthesis to support the bioeconomy?

Language Models for Phytochemical Research

Scientific articles compile results from decades of effort by researchers worldwide. In phytochemistry, these efforts manifest as data describing the occurrence of specific compounds across the plant tree of life. We are exploring the potential of transformer-based language models to extract and systematize this data at scale. Using thousands of manually annotated abstracts, we have measured the ability of language models to identify chemical occurrence reports, confirm known lineage-specific distributions, and uncover previously unrecognized hotspots of bioactive compounds. Current projects include:

  • Automating the extraction and organization of plant chemical data from scientific literature using large and small language models.
  • Applying protein language models to address long-standing questions about enzyme structure–function relationships (funded by a Cottrell Scholar Award from the Research Corporation for Science Advancement).

In the long term, we envision language model-enabled organization, archival, and preservation of plant chemical research results for large-scale, community-wide analysis.

Phylogenetic Mapping of Plant Chemistry

Chemical diversity is a hallmark of plant traits, connected to critical genomic events including horizontal gene transfer, gene clustering, and whole genome duplication. Integrating plant genomic data with chemical profiles has helped predict diversity hotspots, revealed genetic mechanisms of natural product synthesis, and uncovered unique metabolic pathways with industrial or agricultural potential. We are both conducting analyses of chemical occurrence in a phylogenetic context and building tools to make this work more accessible. Current projects include:

  • Developing methods to use preserved herbarium specimens for streamlined access to plant chemical profile data across the tree of life.
  • Building tools that integrate chemical structural information into phylogenetic comparative analyses, including molecular representation learning approaches for pathway discovery.
  • Collaborating across institutions to map specialized metabolites — including wax traits, non-coding amino acids, and dopamine-derived compounds — onto plant phylogenies.

Looking ahead, we plan to build a phylochemical atlas as a community resource, combining language model-extracted occurrence data with genomic datasets to enable predictive identification of lineages with economically important metabolism.

Plant Wax Blooms and Hydrophobic Natural Products

Virtually all land plants coat themselves with waxes to prevent nonstomatal water loss, but some species accumulate such large amounts that wax is visible to the naked eye as a white "bloom." Epidermal cells in multiple plant lineages have independently developed the ability to synthesize hydrophobic natural products in massive quantities and export them to the surface, where they can be harvested. This phenomenon has the potential to inspire biotechnological systems designed to produce natural products at scale. Current projects include:

  • Probing the long-standing question of wax structure–function relationships using synthetic wax discs to systematically test how composition determines water barrier properties (funded by an American Chemical Society Petroleum Research Fund grant).
  • Developing sorghum kernel waxes as a novel domestic co-product to replace imported carnauba wax (funded by a USDA NIFA grant).
  • Investigating the molecular basis of wax bloom induction, including photoreceptor and transporter networks, using sorghum and Kalanchoe as model systems.

Our long-term goal is to use fundamental knowledge of wax biosynthesis and bloom induction to inform synthetic biology systems for producing high-value hydrophobic natural products.