transcriptome assembly

For nonmodel species transcriptome analysis, transXpress is recommended. We often use a modified version of transXpress we call transXpressLite. It uses the Trinity assembler by default, which can require at least 500GB of free disk space to run. Depending on your machine, you may also need to make some modifications to transXpressLite for it to run properly.

Download the transXpressLite code into the directory in which you wish to perform the assembly:

git clone https://github.com/thebustalab/transXpressLite.git

Rename the downloaded folder to make it unique. Perhaps:

mv transXpressLite transXpressLite-kfed

Move into that directory and set up and activate the main transXpress environment:

cd transXpressLite-kfed conda create --name transxpress conda activate transxpress

Create a tab-separated file called samples.txt with the following contents. Important! Remember that there must be an empty line on the end of the samples.tex file. cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq
Start transXpressLite:

./transXpress.sh

Once transXpress is complete, you may wish to move its output files to a long-term storage device. You may wish to keep the following files handy for downstream analysis though:

all the sample name folders (ex. “fed_epi_hi_rep1”)
samples.txt
samples_trimmed.txt
busco_report.txt
/transdecoder/longest_orfs.pep -> is actually “transcriptome.orfs”

# reads <- readFasta("https://drive.google.com/file/d/1r6E0U5LyYwjWenxy9yqh5QQ2mq1umWOW/view?usp=sharing")

# # post <- readFasta("/Users/bust0037/Desktop/ragtag.scaffold.fasta")
# n_chroms <- 18

# pb <- progress::progress_bar$new(total = n_chroms)

# out <- list()

# for (i in 1:n_chroms) {

#   pb$tick()

#   dat <- strsplit(substr(as.character(post[i]), 1, 50000000), "")[[1]]
  
#   b <- rle(dat)

#   # Create a data frame
#   dt <- data.frame(number = b$values, lengths = b$lengths, scaff = i)
#   # Get the end
#   dt$end <- cumsum(dt$lengths)
#   # Get the start
#   dt$start <- dt$end - dt$lengths + 1

#   # Select columns
#   dt <- dt[, c("number", "start", "end", "scaff")]
#   # Sort rows
#   dt <- dt[order(dt$number), ]

#   dt %>%
#     filter(number == "N") -> N_dat

#   out[[i]] <- N_dat

# }

# out <- do.call(rbind, out)


# chroms <- data.frame(
#   lengths = post@ranges@width[1:n_chroms],
#   scaff = seq(1,n_chroms,1)
# )

# ggplot() +
#   statebins:::geom_rrect(data = chroms, aes(xmin = 0, xmax = lengths, ymin = -1, ymax = 1, fill = scaff), color = "black") +
#   geom_rect(data = out, aes(xmin = start, xmax = end, ymin = -0.95, ymax = 0.95), color = "white", fill = "white", size = 0.08) +
#   facet_grid(scaff~.) +
#   scale_fill_viridis(end = 0.8) +
#   theme_classic()

# ggplot() +
#   geom_rect(data = filter(chroms, scaff == 1 | scaff == 2), aes(xmin = 0, xmax = lengths, ymin = -1, ymax = 1, fill = scaff), color = "black") +
#   geom_rect(data = filter(out, scaff == 1 | scaff == 2), aes(xmin = start, xmax = end, ymin = -0.95, ymax = 0.95), color = "white", fill = "white", size = 0.08) +
#   facet_grid(scaff~.) +
#   scale_y_continuous(limits = c(-2,2)) +
#   scale_fill_viridis(end = 0.8) +
#   theme_classic() +
#   coord_polar()

CDF export

similarity searching

Integrated Bioanalytics

transcriptome assembly

Note: Second Edition is under construction 🏗