Skip to content

Evaluating your transcriptome assembly

We will be using Transrate and BUSCO!

BUSCO

  • Benchmarking Universal Single Copy Orthologs (BUSCO)
  • Eukaryota database has 303 genes
  • Metazoa database has 978 genes
  • "Complete" lengths are within two standard deviations of the BUSCO group mean length
  • Genes that make up the BUSCO sets for each major lineage are selected from orthologous groups with genes present as single-copy orthologs in at least 90% of the species.

  • Useful links:

  • Website with additional busco databases: http://busco.ezlab.org/
  • Paper: Simao et al. 2015
  • User Guide

Command:

run_BUSCO.py \
-i Trinity.fixed.fasta \
-o nema_busco_metazoa -l ~/busco/metazoa_odb9 \
-m transcriptome --cpu 2

Transrate

Transrate serves two main purposes. It can compare two assemblies to see how similar they are. Or, it can give you a score which represents proportion of input reads that provide positive support for the assembly. We will use transrate to get a score for the assembly. Use the trimmed reads. For a further explanation of metrics and how to run the reference-based transrate, see the documentation and the paper by Smith-Unna et al. 2016.

  • How do two transcriptomes compare with each other?
transrate --reference=Trinity.fixed.fasta --assembly=trinity-nematostella-raw.fa --output=full_v_subset
transrate --reference=trinity-nematostella-raw.fa --assembly=Trinity.fixed.fasta --output=subset_v_full