Skip to content

Quantification with Salmon

We will use Salmon to quantify expression. Salmon is a new breed of software for quantifying RNAseq reads that is both really fast and takes transcript length into consideration (Patro et al. 2015).

For further reading, see

  • Intro blog post: http://robpatro.com/blog/?p=248
  • A 2016 blog post evaluating and comparing methods here
  • Salmon github repo here
  • https://github.com/ngs-docs/2015-nov-adv-rna/blob/master/salmon.rst
  • http://angus.readthedocs.io/en/2016/rob_quant/tut.html
  • https://2016-aug-nonmodel-rnaseq.readthedocs.io/en/latest/quantification.html

The two most interesting files are salmon_quant.log and quant.sf. The latter contains the counts; the former contains the log information from running things.

We recommend quantifying using the Trinity transcriptome assembly fasta file, which will give expression values for each contig, like this in quant.sf:

Name                  Length    EffectiveLength    TPM    NumReads
TRINITY_DN2202_c0_g1_i1    210    39.818    2.683835    2.000000
TRINITY_DN2270_c0_g1_i1    213    41.064    0.000000    0.000000
TRINITY_DN2201_c0_g1_i1    266    69.681    0.766816    1.000000
TRINITY_DN2222_c0_g1_i1    243    55.794    2.873014    3.000000
TRINITY_DN2291_c0_g1_i1    245    56.916    0.000000    0.000000
TRINITY_DN2269_c0_g1_i1    294    89.251    0.000000    0.000000
TRINITY_DN2269_c1_g1_i1    246    57.479    0.000000    0.000000
TRINITY_DN2279_c0_g1_i1    426    207.443    0.000000    0.000000
TRINITY_DN2262_c0_g1_i1    500    280.803    0.190459    1.000912
TRINITY_DN2253_c0_g1_i1    1523    1303.116    0.164015    4.000000
TRINITY_DN2287_c0_g1_i1    467    247.962    0.000000    0.000000
TRINITY_DN2287_c1_g1_i1    325    113.826    0.469425    1.000000
TRINITY_DN2237_c0_g1_i1    306    98.441    0.542788    1.000000
TRINITY_DN2237_c0_g2_i1    307    99.229    0.000000    0.000000
TRINITY_DN2250_c0_g1_i1    368    151.832    0.000000    0.000000
TRINITY_DN2250_c1_g1_i1    271    72.988    0.000000    0.000000
TRINITY_DN2208_c0_g1_i1    379    162.080    1.978014    6.000000
TRINITY_DN2277_c0_g1_i1    269    71.657    0.745677    1.000000
TRINITY_DN2231_c0_g1_i1    209    39.409    0.000000    0.000000
TRINITY_DN2231_c1_g1_i1    334    121.411    0.000000    0.000000
TRINITY_DN2204_c0_g1_i1    287    84.121    0.000000    0.000000

There are two commands for salmon, salmon index and salmon quant. The first command, salmon index will index the transcriptome:

salmon index --index nema --transcripts trinity.nema.full.fasta --type quasi

And the second command, salmon quant will quantify the trimmed reads (not diginormed) using the transcriptome:

  for R1 in *R1*.fastq.gz
  do
    sample=$(basename $R1 extract.fastq.gz)
    echo sample is $sample, R1 is $R1
    R2=${R1/R1/R2}
    echo R2 is $R2
    salmon quant -i nema -p 2 -l IU -1 <(gunzip -c $R1) -2 <(gunzip -c $R2) -o ${sample}quant
  done