Evaluating your transcriptome assembly¶

We will be using Transrate and BUSCO!

BUSCO¶

Benchmarking Universal Single Copy Orthologs (BUSCO)
Eukaryota database has 303 genes
Metazoa database has 978 genes
"Complete" lengths are within two standard deviations of the BUSCO group mean length
Genes that make up the BUSCO sets for each major lineage are selected from orthologous groups with genes present as single-copy orthologs in at least 90% of the species.
Useful links:
Website with additional busco databases: http://busco.ezlab.org/
Paper: Simao et al. 2015
User Guide

Command:

run_BUSCO.py \
-i Trinity.fixed.fasta \
-o nema_busco_metazoa -l ~/busco/metazoa_odb9 \
-m transcriptome --cpu 2

Transrate¶

Transrate serves two main purposes. It can compare two assemblies to see how similar they are. Or, it can give you a score which represents proportion of input reads that provide positive support for the assembly. We will use transrate to get a score for the assembly. Use the trimmed reads. For a further explanation of metrics and how to run the reference-based transrate, see the documentation and the paper by Smith-Unna et al. 2016.

How do two transcriptomes compare with each other?

transrate --reference=Trinity.fixed.fasta --assembly=trinity-nematostella-raw.fa --output=full_v_subset
transrate --reference=trinity-nematostella-raw.fa --assembly=Trinity.fixed.fasta --output=subset_v_full