- Knowledge center
Formalin fixation and paraffin embedding (FFPE) is the clinical standard for preparing tissue samples for histopathological assessment. Such samples represent a vast repository of tissue material, often with long-term clinical follow-up. With the advent of high-throughput molecular profiling technologies, there is a unique opportunity to screen and comprehensively evaluate biomarkers. Such studies typically require a large sample size and long-term outcome data, both key features of FFPE tissue archives. Unfortunately, the process of tissue fixation induces chemical changes and fragmentation in both DNA and RNA, making subsequent analysis unreliable. Because of this highly fragmented state of RNA in FFPE tissue, most gene expression studies have instead focused on intact RNA from fresh frozen (FF) material.
To make use of the rich resource of FFPE specimens, Biogazelle previously developed a sensitive and accurate method for targeted gene expression analysis on FFPE tissue using a dedicated RT-qPCR workflow, compatible with fragmented and low input RNA samples. To accommodate unbiased mRNA gene expression profiling, we have now successfully implemented a workflow for mRNA capture sequencing on FFPE tissue using the TruSeq RNA Access Library Preparation Kit (Illumina). Using proven TruSeq stranded RNA library preparation chemistry combined with efficient sequence-specific exon capture, the TruSeq RNA Access Library Preparation Kit generates RNA sequencing libraries from degraded samples that focus on the protein coding regions of the transcriptome. Isolating these high-value content regions maximizes discovery power, while requiring only a fraction of the read depth of total RNA sequencing.
This tech note describes the technical assessment of the workflow implementation and zooms in on differential gene expression in colon cancer compared to normal colon FFPE tissue.
To assess the performance of the workflow, we performed mRNA capture sequencing on RNA isolated from 4 colon cancer FFPE samples and 4 matching normal colon FFPE samples. The isolated RNA was of particularly low quality with DV200 (the percentage of RNA fragments >200 nucleaotides) values between 8 and 26 (Table 1 and Fig. 1). Libraries for mRNA capture sequencing were prepared starting from 100 ng of total RNA using the TruSeq RNA Access Library Preparation Kit (Illumina) according to the manufacturer instructions. Paired-end sequencing was performed on a NextSeq 500 instrument (Illumina) with a read length of 75 base pairs to a sequencing depth of 50 million read pairs. Read mapping to the reference genome (Ensembl 78) was performed by TopHat, genes were quantified with HTSeq and raw read counts were normalized using the DESeq size factor.
|colon cancer 1||2.2||26|
|normal colon 1||2.3||18|
|colon cancer 2||2.5||15|
|normal colon 2||2.6||11|
|colon cancer 3||2.3||15|
|normal colon 3||2.5||20|
|colon cancer 4||2.4||23|
|normal colon 4||2.6||8|
Messenger RNA capture sequencing using the TruSeq RNA Access Library Preparation Kit (Illumina) results in stranded sequencing data with high exonic coverage (Fig. 2). Stranded sequencing data allows for precise measurement of strand orientation, enhancing transcript annotation, and increasing alignment efficiency.
An interesting feature of RNA sequencing is the ability to identify alternative splicing events. Although the exonic capture probes in the TruSeq RNA Access Library Preparation Kit are not designed to cover splice junctions, a high fraction of reads map to splice junction sequences (Fig. 3). As a consequence, mRNA capture sequencing using the TruSeq RNA Access Library Preparation Kit is highly efficient for detecting alternative splicing events.
To assess the reproducibility of our workflow, a technical replicate of one FFPE sample was included in the entire workflow. We observed excellent correlation between the normalized gene expression counts from technical replicates of a low quality normal colon FFPE sample (Fig. 4A). At sequencing depth of 20 million subsampled paired reads, we detect on average 14,241 mRNA genes (with a minimum read count of 10) per sample. Next, we analyzed the gene expression data of the 4 matched colon tumor-normal pairs. We found 2738 genes to be differentially expressed between colon cancer and colon control samples (FDR <5%).
Given that mRNA capture sequencing of the matched colon tumor-normal FFPE pairs resulted in a high number of differentially expressed genes, we next evaluated how these results compare to RNA sequencing data from fresh frozen (FF) colon cancer samples. To this end, we made use of publically available RNA sequencing data of four FF colon tumor-normal sample pairs from The Cancer Genome Atlas (TCGA). Normalized polyA+ RNA sequencing data were downloaded from TCGA data portal for selected tumor-normal pairs (Table 2).
|Biogazelle data||TCGA data|
|Samples||4 pairs||4 pairs|
|Library prep||mRNA capture||polyA+ selection|
|Sequencing depth||~50 million||~50 million|
|Pipeline||TopHat + HTSeq||MapSplice + RSEM|
Differential gene expression between FF colon cancer tumor and normal samples identifies 5200 differentially expressed genes (FDR <5%), of which 1753 genes (33.7%) were also detected as differentially expressed in the FFPE data. Subsequently we evaluated how concordant are the log fold changes between FFPE and FF data for differentially expressed genes in the FFPE data. Two aspects are evaluated when comparing fold changes obtained in both sample types: log fold change concordance and direction concordance. The former is measured as the absolute difference between log2 fold change obtained for each gene from both datasets; the absolute difference should be lower than 1 (log2 scale) for concordance. Directional concordance is measured as the sign of the fold change obtained from both datasets. A high fraction (72.5 %) of genes differentially expressed in the FFPE data (FDR <5%) shows concordance in log fold change and direction compared to genes differentially expressed in the FF data (Fig. 5). In addition, 23.7 % of genes differentially expressed in the FFPE data shows concordance in direction. Only a small minority (3.8 %) of genes differentially expressed in the FFPE data shows no concordance in fold change or direction compared to genes differentially expressed in the FF data. This high concordance in differentially expressed genes is reflected in the pathways that are deregulated in both datasets (data not shown).
Despite the many differences in the FFPE and FF datasets (Table 2), we observe a very good correspondence between both datasets. Thus, gene expression changes measured using mRNA capture sequencing on FFPE samples nicely represent those measured using polyA+ RNA sequencing on FF samples.
To enable mRNA biomarker discovery in FFPE tissue, Biogazelle has implemented a workflow for mRNA capture sequencing. Using a highly reproducible and sensitive method, we detect known and potentially new mRNA biomarkers for colon cancer based on the study of FFPE tissue. This optimized workflow enables researchers to apply the power of next-generation sequencing technology to mRNA expression studies on RNA isolated from FFPE samples. Beyond gene expression analysis, RNA capture sequencing can also be used for discovery applications such as identifying alternative splicing events, fusion genes and expressed mutations. Taken together, mRNA capture sequencing opens new and powerful ways of analyzing mRNA from FFPE samples. Reach for your FFPE archives as patients from the past can provide solutions for the future.