- Knowledge center
Formalin fixation and paraffin embedding (FFPE) of tissue biopsies is a commonly used method to preserve tissue specimens resulting in a wide variety of globally accessible samples. These samples represent an almost endless biorepository for DNA, RNA and protein analyses waiting to be explored. During FFPE preservation, RNA undergoes chemical modifications and fragmentation, with further degradation during prolonged storage. Therefore, FFPE-derived RNA has often variable and typically low quality, low yield and a high degree of degradation. This makes the analysis of RNA derived from FFPE samples particularly challenging.
Unlike other sequencing methods, total RNA sequencing enables an unbiased view on the entire transcriptome (excluding miRNAs) of a given sample, as it does not rely on specific capture probes, polyadenylation of transcripts or other modifications. Therefore, total RNA sequencing enables the study of virtually all RNA species, including messenger RNAs and long non-coding RNAs (lncRNAs). The lncRNAs are emerging as important regulators of tissue physiology and are associated with disease processes, including cancer. Analyses of the lncRNA landscape may assist understanding the biology of these disease processes and have potential for future biomarker development.
This tech note describes the technical assessment of the workflow with the focus on the quantification of lncRNAs.
To evaluate the performance of a total RNA sequencing approach with FFPE samples, we conducted total RNA sequencing of 4 colorectal cancer and 4 matching normal colorectal tissue samples. Quality of the isolated RNA of these samples was, typically for FFPE samples, very low with DV values (fraction of fragments above 200 nt of total fragments in a sample) between 15 and 30%, which indicate strong fragmentation of RNA (Figure 1).
Libraries for total RNA sequencing using Illumina's TruSeq stranded total RNA sequencing chemistry with 100ng of total RNA input were successfully generated and paired-end sequenced on a NextSeq 500 sequencer with a read length of 75 nucleotides. On average, more than 90 million reads were obtained per sample, ranging from 89 M to 95 M. Reads were aligned to a curated reference genome (based on Ensemble 75) extended with annotation of long non-coding RNAs (LNCipedia 3.1). Alignment was carried out using TopHat, quantification using HTseq and normalization as well as differential gene expression using DESeq2. Due to the low quality of the RNA, only around 50% to 60% of the obtained reads could be mapped to the genome.
To assess the reproducibility of this workflow, library preparation for one FFPE sample was repeated and the resulting sequencing data analyzed as a technical replicate. A very high correlation (Pearson correlation coefficient r=0.99) between the normalized read counts of both replicates was observed (Figure 2 A).
Next, for expression analysis, data of all 8 samples was normalized to equal sequencing depth. At a sequencing depth of approximately 90 million reads per sample, over 15,000 mRNAs and over 9,000 lncRNAs were detected with on average more than 10 normalized read counts (Figure 2 B). Differential expression analysis comparing the 4 colorectal tumor and 4 normal samples was performed and identified 2,296 significantly differentially expressed mRNAs and 781 lncRNAs (FDR <5%; Figure 2 C).
Among the differentially expressed mRNAs, several genes were identified that are known to play an essential role in colorectal cancer and are included in the KEGG colorectal cancer pathway* (Figure 2 C). Additionally, functional annotation using Gene Set Enrichment Analysis (GSEA) showed that the differentially expressed genes are significantly enriched in different cancer types or cancer-related pathways (e.g. cell cycle regulation, cell proliferation, hepatoblastoma pathway, etc.; FDR <5%).
An example of induced expression for MYC, a gene involved in colorectal cancer and for a lncRNA, are shown in Figure 3. As total RNA sequencing also captures pre-mRNA transcripts in which introns are not removed (yet) during splicing, a fraction of reads aligns to intronic regions (Figure 3). Furthermore, total RNA sequencing enables detection of reads that cover splice junctions and could be further used to inspect alternative splicing events.
To further assess the performance of total RNA sequencing on FFPE samples, the results were compared to mRNA capture sequencing data of the same samples (see Tech Note "Study the transcriptome in FFPE tissue using mRNA capture sequencing"). To this end, mRNAs were selected that could be identified with both methods (n=20,814). This number is strongly reduced compared to the over 80,000 genes that can be detected with total RNA sequencing, as the probe-based mRNA capture approach does not cover all coding genes (98.3% of the RefSeq exome of the hg19 reference genome) and no lncRNAs.
Data for all 16 samples were normalized according to the common mRNAs and expression analysis performed with DESeq2 (Figure 4). In total, more than 14,000 genes were reproducibly detected with on average more than 10 normalized read counts, if sequenced at the same sequencing depth. Around 95% of the mRNAs were identified by both methods (Figure 4 A).
Differential gene expression was performed for these two datasets (mRNA capture and total RNA sequencing) separately and revealed 2,805 genes differentially expressed with total RNA sequencing. When compared to their counterpart values retrieved from mRNA capture sequencing, around 89% were concordant in both log2 fold change difference and direction (up- or downregulated), around 11% were in agreement with regard to the direction, but differed by more than 1 in log2 fold change difference, and less than 1% were not concordant.
Taken together, both mRNA capture and total RNA sequencing are able to detect a highly similar and largely overlapping number of genes and identify them as differentially expressed, despite the drastically different underlying methods, and therefore confirming the results obtained by total RNA sequencing on FFPE samples.
Total RNA sequencing enables an unbiased and comprehensive view on the transcriptome of a given sample, including RNA species that cannot be analyzed by other sequencing methods. Biogazelle has optimized a workflow utilizing total RNA sequencing for clinically relevant and challenging FFPE samples. Applying this approach on a matched colorectal cancer and normal colon tissue set, many genes were identified in a highly reproducible manner, including known genes involved in this cancer type. Among the identified and significantly differentially expressed genes, also many lncRNAs were discovered. Despite emerging knowledge of the involvement of lncRNAs remains to a large extent uncharted territory with great potential to serve as future biomarkers.