RNA-Seq Data Analysis

RNA-Seq Data Analysis Introduction
Workflow
Turn-around Time
Publications
FAQ

Introduction

RNA-Seq is a powerful next-generation sequencing application for transcript discovery, profiling, and quantitation. RNA-Seq can be used to study a wide range of RNA molecules, from coding mRNA to non-coding RNA (e.g., miRNA, small RNAs, linc RNAs). If your project is to study small RNAs (~18-40 nt), please refer to Small RNASeq.

RNA-Seq analysis requires a known genome reference sequence with or without annotation. If a reference sequence is not available for the organism of your interest, please refer to De Novo Sequencing.

Workflow

Following is a list of common analysis items for RNA-Seq. One of our expert bioinformaticians will work closely with you to identify a custom analysis workflow most appropriate for your project.

1) Experiment design consultation
2) Data QC and clean up
3) Alignment to a reference with mapping statistics
4) Gene and transcript-based quantitation
RPKM/FPKM-based quantitation
Raw hit count-based quantitation
5) Differential expression analysis with p-value and FDR
Pair-wise comparison
Time course analysis
Clustering
Principle component analysis
6) Enrichment, pathway, and Gene Ontology (GO) analysis
7) Novel splicing variant analysis
8) Gene fusion and other structural variant analysis
9) SNP discovery and characterization
10) Written project report with analysis methods, publication-ready graphics, and references

Turn-around Time

Upon data receipt, we usually finish a typical RNA-Seq analysis project in 2-3 days. The actual turn-around time, however, is highly dependent on sample number, data amount, and project complexity.

Publications

Publications below are representative research or review papers that will help you understand how RNA-Seq is employed in biomedical research.

  • Trapnell, C. et al. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 28(5):511-5.
  • Li, P. et al. (2010) The developmental dynamics of the maize leaf transcriptome. Nat Genet. 42(12):1060-7.
  • Wang, Z. et al. (2009) RNA-Seq: A revolutionary tool for transcriptomics. Nature Reviews Genetics 10(1) (57-63).

FAQ

Which should I choose for my RNA-Seq experiment, single-end or pair-end?
We always recommend pair-end runs for RNA-Seq experiments. The additional distance and orientation information provided in pair-end reads is required for effective detection of novel splicing variants, gene fusion events, and other changes. If your main objective is to perform differential expression analysis or you are working with a genome with no splicing, single-end reads may be used as a low-cost alternative.
How does the read length impact RNA-Seq analysis results?
In general, long reads facilitate accurate and efficient mapping, especially for spliced genomes. Long reads, however, cost more and take longer time to generate. We generally recommend minimal 50bp read length for RNA-Seq experiments.
How many reads are enough for my RNA-Seq experiment?
Although the sequence size of a given RNA population (e.g., the entire transcriptome) is relatively small, the copy number of each RNA sequence has a wide range. We recommend 40 million reads for mammalian genomes. If your main goal is novel splicing or gene fusion discovery, you should plan to acquire even more reads.

Note: RNA-Seq requires a large number of reads due to the great sequence and copy number diversity in any RNA sample. Please take care to preserve that diversity when handing your RNA sample. For instance, if you over-amplify your sequencing library via PCR, you risk reducing the diversity and more reads might not help improve the analysis results.
How does gene annotation in the reference impact my RNA-Seq experiments?
RNA-Seq requires a known reference genome sequence. Annotation is not required but most sequenced genomes come with gene annotation. It is crucial to understand the nature and accuracy of the annotation in order to analyze and interpret the RNA-Seq results correctly. Experimental evidence-based gene annotation greatly facilitates RNA-Seq data analysis. Software-predicted gene annotation, however, may interfere with RNA-Seq results. We recommend discarding software-predicted gene annotation in RNA-Seq analysis. Instead, gene structure can be reliably assembled based on the reference sequence and RNA-Seq data.
Should I perform rRNA removal in my RNA-Seq experiment?
Ribosomal RNA (rRNA) consists of >90% of the total RNA. Without removing or reducing rRNA species in your RNA sample, the majority of the sequencing reads will cover RNA, which will decrease both the coverage and quantitation accuracy of non-rRNA species. In addition, the excessive coverage of the rRNA genes will complicate the computational processes of RNA-Seq data analysis. Therefore we recommend applying rRNA removal/reduction techniques if your main objective is to identify splicing variants and detect rare transcripts.

rRNA removal/reduction techniques, however, may introduce bias in quantifying highly abundant transcripts. If your main objective is to absolute transcription quantitation, we recommend forgoing the rRNA removal/reduction steps. To increase the accuracy of transcript quantitation, we recommend looking into amplification-free and/or strand-specific protocols.
Why should I use replicate in my RNA-Seq experiments?
One of the key questions in differential gene expression analysis is whether the calculated difference truly reflects the difference between experimental and control sample groups, or is caused by random noises. Biological and technical replicates in the experimental design are proven to increase detection precision and remove noise. We urge researchers to employ replicates in any differential gene expression experiments and please contact one of our expert bioinformaticians to discuss how to cost-effectively incorporate replicates in your project.