Exploiting tumor RNA seq data for prediction of immune checkpoint inhibition through TMB & MSI assessment


The following is a transcript of a presentation given by Biogazelle CSO, Prof. Jo Vandesompele during the 4th Annual Advances in Immuno-Oncology conference (London, UK, May 20th 2019)

Good afternoon everyone. Thank you for the kind introduction and for giving me the opportunity to present. Biogazelle is a genomics services company focusing on RNA. In a quality environment (ISO17025/GCLP), we perform customized transcriptome analyses on clinical samples such as liquid biopsies and FFPE tissues. We genuinely believe in the power of RNA for its ability to accurately inform on cellular states. I sometimes refer to DNA as a static painting (here by Peter Paul Rubens with a portrait of his 5-year-old daughter Clara) and to RNA as real-time snapshots or still images from a video. This one features Lotte from birth until the age of 18. The girl's DNA has not changed in that period, but her 'expression' has. RNA is likely one of the most dynamic and responsive information layers of life.

Most of you are familiar with the study of RNA in terms of differences in abundance, such as differential gene expression analysis, for single genes, or multigene signatures in a heatmap, or enrichment analyses to assess if a pathway is active or repressed. However, an essential part of the RNA language is structural. As such, we should study splice isoforms, circularized transcripts, fusion transcripts, SNVs, indels, and RNA editing. We have dedicated pipelines to study these structural aspects.

This grid provides an overview of how different types of omics data (rows) are used for individual genomic or immunogenomics analyses (columns) to study tumor-immune cell interactions. I made a small modification to the original figure in Nature Reviews Genetics, as RNA can effectively be used for gene copy number analyses. In fact, there's nothing that RNA cannot provide that DNA WES/WGS can. What's more, RNA seems far superior in the context of immunogenomic analyses. In addition to gene expression analyses, it allows the assessment of mutations, CNVs, HLA-typing, antigens, tumor-infiltrating lymphocytes, and T-cell receptor repertoire. Collectively, the RNA analyses help to identify molecular mechanisms, predictive biomarkers, drug targets, and novel therapeutic strategies for cancer.

So why do we need to understand tumor-immune cell interactions better? Well, immune checkpoint inhibitors (and other immunotherapies) have demonstrated durable clinical responses across several tumor types, but only a fraction of patients respond. As these therapies are costly and may present toxic side effects, biomarker-driven selection of patients that will respond is needed, in order to increase response rates, save costs, and avoid unnecessary side effects. The IO biomarker toolset is expanding: some point at tumor cell intrinsic, others at extrinsic factors; some are well validated, others not; some are valid across multiple cancer types (so-called pan-cancer biomarkers), others seem to hold for only one cancer type. Transcriptome sequencing has become routine, but tumor material is often limited, and separate tests of different biomarkers add complexity, cost and time, a comprehensive RNA based test seems very attractive. In my presentation, I present some of our attempts to accomplish that goal. I show that we can detect or quantify MSI, eTMB, individual gene expression levels and pathway of multigene signatures, and immune cell infiltration.

Microsatellites are repeated sequences of DNA. These sequences consist of repeating units of 1 to 6 basepairs in length. MSI is the condition whereby the length of the repeating units changes, because of errors in the DNA mismatch repair machinery. The US FDA has granted accelerated approval to pembrolizumab for pediatric and adult patients with microsatellite instability-high (MSI-H) or mismatch repair-deficient solid tumors. It is the first time the agency has approved a cancer treatment based on a common biomarker rather than an organ-based approach. In our attempt to develop a comprehensive RNA based IO test, we built a pipeline to detect and quantify microsatellite instability from RNA seq data. This is work from Dr. Steve Lefever, a postdoc in my Ghent University lab. The fundamental principle is that repeat containing reads are identified, and that unique sequences adjacent to the repeat are split and used separately for mapping. Only uniquely mapping reads are used, followed by comparing the observed repeat length with the expected repeat length in the genome.

Considering all possible mono/di/tri/tetra repeats with at least 4 repeating units, approximately 65 million repeats are present in the human genome. Most are located in intergenic or intronic regions. About half a million can be detected in RNA seq data. As expected, most of these are in the UTR (untranslated) and CDS (coding sequence) regions of our genes.

Let's now look in more detail into one of these repeats, as an example a mononucleotide A10 repeat in the UTR of genes. In the x-axis, you see the difference in length between the observed and the expected repeat length. In the y-axis, you see the number of reads that show that particular length, expressed as a density function. This way of visualization creates electropherogram-like plots that people are used to interpreting for MSI assessment. The orange result is from a microsatellite stable (MSS) sample, and blue is an microsatellite instability high (MSI-H) sample. We can discern 4 differentiating features: the location of the dominant peak, the width of the electropherogram, the imbalance of peaks adjacent to the dominant peak, and the overall mean length difference of all peaks.

All these repeat features are markedly different when looking at RNA seq data from ~200 colon cancer samples from The Cancer Genome Atlas. Here I show boxplots for T13 intergenic repeats in MSI-H (blue) and MSS samples, classified according to TCGA using standard DNA-based MSI test. It is well established that mononucleotide repeats are generally shorter in MSI-H samples.

In the previous plot, I showed the length difference result for one particular repeat, i.e., intergenic mononucleotide T13. We calculated the significance for all detected repeats in the colon cancer samples and looked at the 4 discriminating features (i.e., the 4 blocks). The columns in each block denote the annotation relative to the transcriptome (i.e., CDS, intronic, UTR, intergenic, or combined) and the rows are different repeats. The heatmap color denotes significance (all are p<0.05, corrected for multiple testing). While significant repeats can be found for all annotations and all repeat features, we observe the highest number of significant repeats when considering the average difference in observed vs. expected length. Also, in line with the literature, mononucleotide A and T repeats seem to be the most promising classifiers. Several of them are significantly different between MSI-H and MSS for the 4 features (length, position, imbalance, and width). Of note, we see similar results for other cancer types, e.g., stomach and uterine/endometrium cancer.

Next, we wanted to assess if we could classify a single RNA seq sample with respect to its MSI status. For this purpose, we split the colon cancer TCGA data set into a training set of 20% and a test set of 80%. Using precision/recall curves, we determined the optimal cut-point in the training data and then classified the test samples. Here you see the result for a T17 mononucleotide repeat across all annotation features. Except for 3 MSI-H samples that are classified as MSS, all samples are correctly classified. The classification performance is excellent, with 100% PPV and almost 98% NPV. The balanced accuracy is 96%. Of note, we did not attempt yet to combine different repeats to create an ensemble classifier. One could expect that the classification accuracy (which is already very high) should even be better.

Next, we wondered how deep cDNA libraries should be sequenced for accurate MSI classification. Here, I show the result for one example repeat; this is a mononucleotide repeat T11 in the UTR of human genes. We subsampled the original RNA seq data from 20 M reads for each sample down to 6400 reads per sample. Down to 32,000 reads per sample, the difference in repeat length for T11 in UTR remains significantly different between MSI-H and MSS samples. We conclude that RNA based MSI detection is robust against variable read depth among samples and does not require many reads.

Finally, we wanted to assess the impact of tumor purity. To mimic a range of different tumor purities, we used RNA seq data from 16 colon cancer cell lines from the CCLE database and matched them in 8 pairs of MSI-H and MSS cell lines. The results are for mononucleotide repeat A16 across all annotation features. When we dilute MSI-H RNA down to 3% with MSS RNA, the repeat length difference remains significant. This strongly suggests that tumor cell purity is likely not going to be a confounding variable in the RNA seq based MSI detection. One final remark on this application, we don't propose to perform tumor RNA sequencing for the sake of MSI detection. Instead, we propose to assess MSI status as soon as tumor RNA seq data are available, likely generated for other purposes.

Let's shift gears now and look at the ability to call mutations using RNA seq data only. Compared to the more established practice of DNA based mutation analysis (e.g., through whole exome or whole genome sequencing), there are particular challenges of using RNA for this purpose. First, the read coverage is a function of the individual gene expression levels, so we observe regions covered once or thousands of times. Next, RNA editing results in numerous single nucleotide variants, that are not present at the DNA level. RNA editing is a common phenomenon of post-transcriptional alteration of specific nucleotides at specific locations; these alterations are thus not encoded in the genome. The most frequent alterations are A > G and C > T, caused by specific cellular enzymes. RNA editing plays a role in normal development, cellular homeostasis, and disease. It is an often underappreciated level of complexity. Importantly, RNA editing can result in neoantigens in cancer, i.e., the incorporation of a different amino acid. Third, split reads (because of splicing) may result in mapping errors and hence perceived sequencing errors around the splice junctions. Fourth, randomly primed cDNA results in sequence mismatches at the location of the random primers. And finally, matched normal RNA is not always available, and when it is, the relevance can be questioned. How similar is, for instance, normal breast tissue compared to ductal breast cancer cells? Clearly, these transcriptomes are widely different, and not all genes can serve as a normal reference. For all these reasons, we build our computational pipelines without the need for matched normal RNA.

This is a schematic illustration of our RNA seq variant calling pipeline. It allows fusion gene detection using FusionCatcher and SNV and indels using VarScan. The key points are the removal of PCR duplicated reads, the extensive filtering of reads and variants, and the accurate annotation of the remaining variants. The pipeline is under constant development. Especially the variant calling on liquid biopsy-derived RNA (e.g., from plasma) is particularly challenging because of low coverage and low allele frequencies. For this, error correction tools are mandatory.

As nucleotide coverage in RNA seq data hugely varies and can be quite low, we determined the theoretical probability that a variant is detected given a particular variant allele frequency, a certain depth (i.e., the total coverage) and a minimal coverage of 3 for a given variant. As you can see, with a VAF of 50% (the green line) and total coverage of 5 unique reads, there is a 50% chance that we call the variant given at least 3 independent reads for the variant. With a total coverage of 10, we will call almost 95% of the variants. However, if the VAF is only 20%, then only 6% of the variants will be called at a total coverage of 5. We need a total coverage of 20 to call 80% of the variants with at least 3 variant reads. Hence, VAF and positional coverage have a major impact on variant calling.

To assess the precision and recall of our method, we compare our RNA seq variant calls with the high confidence DNA variants determined by the Genome in a bottle consortium on Coriell cell line GM12878. Of note, we only included variants that are covered with at least 5 independent reads, to make it a fair comparison, because the majority of DNA variants are not expressed. The overlap between 2 replicated RNA seq data sets and the confidently called DNA variants is very high; both methods call 17,495 SNPs. The small number of SNPs called by DNA only (less than 6%) are likely mono-allelically expressed variants or variants in genes with low coverage, resulting in stochastic sampling noise not picking up the variant with at least 3 reads. The 4500 variants that are called in both RNA seq samples are mostly RNA editing variants, obviously not present in the DNA.

As a final sanity check of our pipeline, we sequenced RNA from 16 FFPE tumor blocks with confirmed diagnostic mutations. These include 4 fusion genes, SNVs in KRAS, BRAF, EGFR and PDGFRA, and deletions in EGFR and KIT. All mutations were correctly called.

As indicated before, we wished to build our pipeline without the need for matched normal RNA or DNA. For somatic variant calling, required to determine the tumor mutation burden, this necessitates careful and stringent filtering of variants. Using a combination of database annotations for known mutations (in COSMIC) and SNPs in the normal population (gnomAD), complemented with a strand-specific prediction of RNA editing events and A VAF filter, we zoom in on the somatic variants. To benchmark our somatic pipeline, we used matched RNA seq and DNA exome seq data from TCGA. As a gold standard reference, we used the somatic calls from TCGA using matched tumor and normal DNA. Here you see a boxplot of the VAFs for 2 bladder cancer samples, stratified for being a true or false positive. It's clear that false positives are characterized by very low VAF. Based on these results, we use 20% as minimal VAF for calling.

Next, we calculated the expressed tumor mutation burden (or eTMB) as the number of non-synonymous mutations per megabase of expressed transcriptome.
We did this for 10 bladder cancer samples and 50 lung adenocarcinoma samples. Overall, the correlation is very high, but mind a different scale, so any thresholds will need to be adjusted. While the correlation is high, at a given eTMB of say 2.5 in lung cancer, the TMB values may differ by a factor 3. This could point at some relevant difference between these 2 scores. Larger datasets with corresponding IO therapy response are required to determine if eTMB is as good or better at predicting response. Of note, in these 2 cancer types, we did not notice a marked difference if we included RNA editing induced non-synonymous mutations or not. We are currently analyzing more datasets to what extent RNA editing induced non-synonymous variants impact eTMB scores.

Analogous to the MSI results, we also assessed the impact of sequencing depth. It is clear from this figure that reducing sequencing depth from 50 down to 20 million reads is not affecting the eTMB scores. Again, this strongly suggests that eTMB is robust against variable read depth among samples.

Also tumor purity was assessed. Here, we mixed the matched normal tissue RNA with the original tumor RNA. As we have no clear idea of the tumor purity in the original RNA seq data, we can only say that adding 40% normal RNA does not seem to have a major impact on the eTMB scores. Of note, some samples seem to be more effected by tumor purity than others. Nevertheless, we recommend to work with as high tumor purities as possible, to maintain high detection sensitivity.

I would like to end with the exploitation of RNA seq data to estimate the identity and abundance of infiltrating immune cells in bulk tumor RNA seq data, and with the integration of this information with other biomarkers. Emerging results suggest that the type and abundance of certain immune cells (e.g. cytotoxic T-cells) corresponds with response to immune checkpoint inhibition. Deconvolution is the computational equivalent of single-cell sequencing and estimates fractional cell abundances. For this to work, signature gene sets are required for all the cell types of interest. Then, mathematical algorithms deconvolute the tumor mixture data into fractions.

Using the same lung cancer data as for eTMB assessment, we now estimated the fraction of these 6 immune cell types using deconRNAseq and the LM6 gene signature. This result strongly suggests that there are 2 different groups of lung cancer, with CD8+ T-cells present in only one of them.

Doing the same for the bladder cancer samples and integrating the immune cell abundances with PD1 and PDL1 expression levels (as assessed by RNA seq), we see strong and significant correlations. Amongst others, the strong association between eTMB and CD8 cell type abundance is interesting. Of course, the numbers are small, and larger datasets will be required to confirm this association and determine the added value for more accurate prediction of response.

As a final example of using RNA seq data to integrate eTMB with a predictive gene signature, we did similar analyses as in a recent Science paper. We used the same T-cell inflamed gene expression signature that contains IFN-gamma–responsive genes related to antigen presentation, chemokine expression, cytotoxic activity, and adaptive immune resistance. They demonstrated that objective response rates were highest in the TMB-high and gene-signature-high group. We also determined the correlation between these 2 features and stratified the 50 lung cancer samples in 4 groups. The 5 pink patients are predicted to have the highest response rate compared to eTMB or GEP only high patients.

I want to conclude by stating that I strongly believe in a comprehensive test that simultaneously measures different biomarker aspects. This is supported by a large study in which TMB (using a 592 gene panel), PDL1 (using IHC) and MSI (using NGS based test) were assessed. While this is not a comprehensive RNA test, it certainly shows that these predictive markers provide complementary information. In fact, less than 1% is positive for all 3 predictive biomarkers. Among the 11,348 cases examined, the overall rates of MSI‐H, TMB‐high, and PD‐L1 positivity were 3.0%, 7.7%, and 25.4%, respectively. Thirty percent of MSI‐H cases were TMB‐low (blue), and only 26% of MSI‐H cases were PD‐L1 positive (pink). Of note, there are marked differences among tumor types.

At Biogazelle, we postulate that RNA will play a central role in the assessment of tumor intrinsic and extrinsic factors to predict response to IO therapies. Further, the IO biomarker toolset is growing, but the average patient sample size is often too small, biomarkers need independent confirmation, their pan-cancer potential should be assessed, and different types of markers should be integrated.
Biogazelle is ready to support your studies, using validated methods in a quality-controlled environment.

Download the Presentation Slides

Subscribe to email updates