Although RNA targeting molecules such as ASOs or siRNAs feature sequences specific for their target RNA, they often exhibit non-specific regulation of so-called off-target genes. There are many relevant applications related to the identification such off-target genes. One is related to repurposing results from large-scale siRNA screens to identify novel candidate targets. Several studies have used this approach to gain novel insights in disease pathways and ultimately novel therapeutic targets.
Another approach is related to the evaluation of siRNAs as oligonucleotide therapeutic drugs and the impact of chemical modifications, formulation or dosing on their off-target repertoire.
siRNA off-target effects are typically mediated through miRNA-like binding of the siRNA seed to the 3’ UTR of target mRNAs. Several approaches to identify siRNA off-target effects use the seed sequence to predict these off-targets. However, these approaches are prone to false positive predictions, and do not provide any information on the magnitude of the effect. For instance, an off-target gene that would only be repressed 20% may be less relevant that one that is repressed 2-fold.
To tackle these issues, one needs to integrate gene expression data, and run differential expression analysis to identify genes that are downregulated upon siRNA treatment. However, this requires a high-throughput and cost effective method.
To this end, we developed HTTargetSeq, an RNA-sequencing workflow that can be applied directly to cell lysates from 96-well culture plates.
We typically require 4 replicates per condition and apply a 3’ end-sequencing library prep workflow with shallow sequencing (1-5M reads per sample). This results in reproducible detection of around 7,000-10,000 genes based on which we apply several data analysis workflows to map and prioritize siRNA off-target genes.
Slide 4 demonstrates the technical performance of HTTargetSeq. The left plot shows the cumulative distribution of the number of detected genes across 384 samples, with a median of 7,000 genes per sample. You can see from the middle plot that the reads are focused at the 3’end of the genes and the third plot demonstrates that gene expression counts are very reproducible between technical replicates in the workflow.
Slide 5 presents a real case study from one of our customers who ran an siRNA screen in the context of fibrosis, which demonstrates the potential of HTTargetSeq. The goal of this screen was to identify regulators of the TGFB phenotype by screening a panel of 1617 genes with 3 independent siRNAs per gene.
There were basically 2 types of results. Results like the one for gene A, where all 3 siRNAs induce target KD and induce the phenotype of interest. In this case, gene A would be labeled as a hit. There were, however, also cases like the one shown for gene B, where all 3 siRNAs induce target KD, but only one induces the phenotype of interest. Gene B would not be labelled as a hit, and siRNA 3 is likely affecting one or multiple modulators of the TGFB pathway through off-target effects.
86 of these ‘off-target’ siRNAs were identified in this screen and all were selected for analysis with HTTargetSeq. Two positive controls, siRNAs against TGFBR1 and TGFBR2, were also included. The plots on slide 8 demonstrate that HTTargetSeq can nicely detect the downregulation of TGFBR1 and 2 mRNA expression in these control condition, and pathway analysis on using the gene expression data demonstrated the TGFBR1 and 2 KD is functional, as several TGFB pathway gene sets were identified as significantly downregulated.
When looking at the 86 ‘off-target’ siRNAs, HTTargetSeq identifies hundreds of differentially expressed genes for the majority of the siRNAs. As expected, most differential genes are downregulated and these likely harbor the off-target genes.
To verify that genes with seed sites for our siRNAs are indeed preferentially downregulated, we plotted the log2FC for genes without seed (black) and genes with different canonical seed types. These seed types are defined based on the number of complementary bases between the siRNA seed and the mRNA. From the miRNA field, we know that a higher degree of complementarity is associated to a higher potency and more downregulation (see plot on the right). This is exactly what we observed on the HTTargetSeq dataset as well. Genes with 8mer seed sites showed stronger downregulation compared to genes with 7mer of 6mer seed sites. Note that almost half of the genes that have a seed site do not show downregulation, underscoring the potential issues with false positives when relying on seed predictions alone. In addition, only a fraction of genes with seed sites (5%) show a 2-fold or higher downregulation and are likely the most biologically relevant genes.
To prioritize off target genes, we typically integrate additional features, most importantly recurrence. Genes that are identified as off-targets across multiple of the 86 siRNAs that were screened are more likely to play an important role in the pathway.
To prioritize off-target genes, we therefore focussed on genes that had at least one 8mer seed and showed a 2-fold downregulation. The table shows the top 10 most recurrent genes that match these critera. Interestingly, we identified TGFBR1 as one of the most recurrent off-target genes, validating our approach. Several other components of the TGFB-pathway were found among the top ranked genes. More importantly, also genes that have not been associated to TGFB pathway before were identified.