- Knowledge center
Early stages of drug discovery often depend on relatively simple reporter assays or phenotypic readouts, providing little or no information on the drug’s mechanism of action (MOA). Gene expression profiling technologies like RNA-sequencing enable a more comprehensive characterization of compounds by measuring the activity of molecular pathways. This information can complement phenotypic readouts and can be used to prioritize candidate compounds for further testing. RNA expression profiling also serves as a generic test that can be applied to any drug development pipeline without the need for target-dependent customization.
In this webinar, Prof. Pieter Mestdagh, senior scientist at Biogazelle, presented two novel workflows (HTPathwaySeq/HTTargetSeq) that processes 384 cell lysates with RNA seq to generate expression data analyzed at the pathway level. The data presented demonstrate that shallow sequencing of crude cell lysates reproducibly detects over 5,000 genes with at least 10 reads. Subsampling of deep sequencing datasets showed that differential pathway analysis is largely unaffected when reducing the number of genes to this level. Consequently, reliable pathway insights can be obtained at high throughput and relatively low cost while not being limited to a predefined set of genes or pathways. In cell perturbation screenings (small molecules, RNAi, antisense or CRISPR), the application can provide in depth information on the mode of action underlying the induced cellular phenotypes as well as molecular similarity scores to identify those perturbations acting similar to a reference condition or via shared molecular mechanisms. This methodology can also be applied to repurpose off-target siRNA hits from library screens to reveal novel candidate therapeutic targets for drug development.
Pieter also discussed how coupling the data generated from the workflow with a tailored visualization platform can facilitate data interpretation enabling:
Good day to everyone joining us and welcome to today's Xtalks webinar. Today's talk is entitled Applications of High Throughput Gene Expression Profiling and Early Drug Discovery. My name is Mira, and I'll be your Xtalks host for today. Today's webinar will run for approximately 60 minutes. This presentation includes a Q&A session with our speaker. This webinar is designed to be interactive and webinars work best when you're involved. So please feel free to submit questions and comments for our speaker throughout the presentation using the questions chat box, and we'll try to attend to your questions during the Q&A session. This chat box is located in the control panel on the right hand side of your screen. If you require any assistance, please contact me at any time by sending a message using this chat panel. At this time, all participants are in listen only mode. Please note that this event will be recorded and made available for streaming on Xtalks.com.
At this point, I'd like to thank Biogazelle, who developed the content for this presentation. Biogazelle is a CRO specializing in high value applications to support pharmaceutical research, clinical trials, and diagnostic test development. To accelerate the development of small molecules, RNA targeted drugs, and adoptive cell therapies, they apply a suite of genomic and transcriptomic technologies to find and validate RNA biomarkers and to assess efficacy, safety and toxicity. They hold a unique forefront position in the application of quantitative PCR, digital PCR, and dedicated RNA sequencing workflows on precious clinical samples, such as liquid biopsies and FFPE tissues. The laboratories are ISO-IAC-1702-2005 accredited, and PCR based services can be performed in GCLP compliance.
Now, I'd like to introduce our speaker for today's event. Pieter Mestdagh is a senior scientist at Biogazelle as well as associate Professor at the faculty of medicine and health sciences at Ghent University, Belgium. He holds master's degrees in industrial engineering and biochemistry in 2004, and bio-science engineering and cell gene biotechnology in 2006, as well as a PhD in biomedical sciences in 2011. He is the author of more than 100 scientific articles and international journals as well as six European patents. And now without further ado, I'd like to hand the mic over to our speaker. You may begin when ready.
Good morning and good afternoon. Thanks for the introduction Mira. In the next 40 minutes or so I will demonstrate two applications that were developed by Biogazelle to support early phases of drug discovery.
Those two applications are called HTPathwaySeq and HTTargetSeq. The technology behind these two applications is nearly identical, I will go into more detail later, but our application is slightly different. So while HTPathwaySeq is mainly focused on generating pathway level information based on molecular gene expression profiling, HTTargetSeq goes beyond pathway level profiling and get your information down to the individual gene level. And so the applications of course are slightly different. 1. HTPathwaySeq, I will try to explain and demonstrate that this is mainly a technology that we propose to introduce during early phases of drug discovery to quantify or characterize the mode of action of a set of candidates compounds to generate molecular toxicity profiles or to evaluate molecular similarity between compounds. HTTargetSeq, on the other hand, can actually provide all of the information that HTPathwaySeq can, but on top of that goes beyond pathway level information and allows us to look at individual genes. And this is a technology that will be mainly positioned in the field of oligonucleotide drug development, where quantifying on and off-target effects of oligonucleotide drugs is an important step in the development process.
I think this is a process that doesn't need too much of explanation. These are various stages during drug discovery. And the technologies that I will present today are mainly positioned in a phase where once you have identified a set of compounds of interest that establishing the activity of these compounds is where HTPathwaySeq can play a significant role. So typically this phase is supported by, I would say, reporter assays or certain phenotypic readouts, but these do not typically provide a lot of information on the mechanism of action for the drug. And so by generating a molecular profile, this type of information can be revealed. And this is where we position HTPathwaySeq.
So molecular profiling or RNA profiling is something we believe in, and we're not the only ones, of course, several others have proposed to introduce RNA profiling during these phases of drug discovery for a number of reasons. And so there's a lot of upside potential to RNA profiling. First of all, based on the RNA profile, you can of course establish the activity of individual pathways. And this type of information can be used to compliment your phenotypic readout and to better characterize and prioritize the compounds or drugs that you are currently evaluating. And what is more is that RNA profiling is generic tests. There is no need to customize your readout based on the phenotype or the cell type that you're studying. It's a test that can always be applied irrespective of the mechanism of action of your drug irrespective of the phenotypic effects and irrespective of the cell types that you're working with. Now, in order to take advantage of RNA profiling during those stages of drug development, there's a number of requirements, mainly technology requirements that need to be met. So first of all, the technology that you would apply to do this needs to be high-throughput. It needs to be able to cope with several compounds, different concentrations, so many different conditions should be tested or are typically tested in such a phase. So your technology has to be compatible with them. Second, of course it has to be cost effective. And thirdly, the technologies should provide sufficient coverage of the transcriptome to allow you to infer pathway activities. And so before I continue, there's a short poll question that we would like to raise. And Mira if you could bring it up now, that would be good.
Thank you, Pieter. Audience members, a poll question has just showed up on your screen, please select all that apply by selecting one of the following question. So the poll question reads the following, which of the following represent challenges that you face during your early stage drug discovery process? Please select all that apply. You can select any of the options by clicking one of the following answers. First one is assessing the impact of dosage on pathway activity, or assessing the impact of formulation on pathway activity, understanding a compound's molecular mechanism of action, predicting compound toxicity or other. So the question again reads, which of the following represent challenges that you face during your early stage drug discovery process? Please select all that apply.
I've received a majority vote and I am now ending the poll question and sharing the results. The results are the following: 67% said understanding a compound's molecular mechanism of action. 52% said predicting compound toxicity. 44% said assessing the impact of dosage on pathway activity. 19% said assessing the impact of formulation on pathway activity. And 11% said other. So back to you, Pieter.
Okay, thank you, Mira. And thanks for answering that question. So let me continue with explaining what we have developed, the technology that we've developed that actually meets those technology requirements and can be used to apply in those phases of drug discovery.
So, of course, I think it's clear that the technology that could potentially do all of this is RNA sequencing. RNA sequencing allows you to quantify expressions, thousands of genes for many different conditions. Let's say that the classic standard protocols to apply RNA sequencing do not meet requirements of being cost effective, of being high-throughput. RNA sequencing is the basis of the HTPathwaySeq and HTTargetSeq applications that we have developed.
We've made a few modifications and modifications that allow us to apply this technology in high throughput in a very cost effective manner. So first, the HTPathwaySeq and HTTargetSeq applications are capable of generating gene expression profiles starting directly from crude cell lysates. And so this is a very important aspect because it precludes the needs, the requirement for cumbersome and time consuming RNA isolation procedures that you would need to perform on dozens, if not hundreds, of conditions. So what we have done is developed a technology that works directly on cell lysates. So basically you perform your experiment in 96-well cell culture plates, you can apply any perturbation that is of interest. Compounds if you're doing drug development antisense or siRNAs, if you're looking into oligonucleotide drug developments or other perturbations for that matter. We typically include four biological replicates for each of the conditions. And a single experiment is composed of 384 samples. So roughly 96 different conditions that can be tested in a single experiment. So that's the first thing. Second thing is that we apply technology that allows us to generate those libraries relatively quickly and efficiently. And this technology is based on a 3' end RNA-sequencing workflow that only sequences the 3' ends of polyadenylated genes. And to make it cost effective, we are actually not sequencing that deep. We are generating between one to five million reads for each sample. And this is also where the difference between HTPathwaySeq and HTTargetSeq is situated. For HTPathwaySeq we will typically generate no more than one million reads per sample. For HTTargetSeq we go up to five million reads per sample.
And even at that sequencing depth, and I will demonstrate it with a number of examples, even at that shallow sequencing depth, you get sufficient coverage on the transcriptome to infer pathway activity, or even differential gene expression. The number of genes that you can identify ranges from 7,000 to 10,000, depending on whether you're applying HTPathwaySeq or HTTargetSeq. And then of course, there's data analysis that allows you to look at the data from a pathway perspective or differential gene expression perspective. And that allows you to integrate additional levels of information to better understand what exactly is happening on let's say a molecular level in the cells that are treated with different compounds.
So a bit of let's say quantitative information on the technologies themselves. This is typically what you get from a HTPathwaySeq experiments in terms of the number of genes, the coverage of the transcriptome and the reproducibility. So in the left hand plot, you can see that the medium number of detected genes across 384 samples or single experiments is around 7,000. You have samples where this number is a bit lower. You have samples where this number is slightly higher. But so roughly on average, 7,000 genes are detected per sample. And these are 7,000 genes, these are genes that have at least 10 counts in the RNA sequencing redact. You can see that we are doing 3' end sequencing. If we look at the gene body coverage and the central plot, you can see that the majority of reads are indeed piling up at the 3' ends of the genes. And then finally, in the right hand plot, you can appreciate the high level of reproducibility that we can obtain. This is technical replicates. So two different samples within the plates that are lysed, that are prepped for RNA sequencing and analyzed. So you can see we get pretty high reproducibility despite the fact that this is done directly include some lysates.
So to identify differential pathways, we start with a differential gene expression analysis using the gene expression data that is generated from the HTPathwaySeq technology, and actually apply a gene set enrichment analysis based approach to identify those genes sets that are either significantly enriched among the genes that are up-regulated or down-regulated in each of the conditions that are being tested. And to do so, we make use of a wide variety of publicly available gene set collections amongst other strongly molecular signatures' database. And so these gene set collections represent both canonical pathways, but also various curated gene lists representing both chemical and genetic perturbations. This gene set enrichment analysis based approach has been shown to be a method that provides you with the highest accuracy when performing non topology based pathway enrichment analysis. Furthermore, if that would be of interest, there's also the possibility to include custom gene sets. So if you would have collections of genes you would like to see evaluated for their enrichment among the up or down regulated genes in each of the conditions that you are testing, these gene sets or these collections of genes can be turned into gene sets and can be incorporated in the gene sets enrichment analysis.
Just to show you that even with 7,000 genes, you can pick up the most relevant pathways that are changing between two conditions, we've done a number of simulations, and this is not really simulations, this is on real data, where we are actually comparing two groups in three different datasets, comparing two groups, one where we use all of the available genes in that data set (so typically 15,000 more or less) and one where we only use the 7,000 most abundant genes. So these are typically the genes we detect with our HTPathwaySeq method. And so, as you can see in all of those datasets, the Y axis is representing the number of enriched pathways that are found, whether you use old genes or just the top 7,000 genes. And in green is depicted what portion of those are shared between using old genes. So we're using only 7,000 genes. So you can see that in all of the three datasets, the majority of the gene sets that are found to be differentially regulated between the two conditions when using all of the genes are actually also retrieved when using just the top 7,000 genes. So even with the coverage of only 7,000 genes, half of the transcriptome, more or less, you still have the ability to infer in great detail the differentially expressed pathways between conditions.
And so by doing so you can get insights in the mechanism of action of your compounds. You can get insights in potential toxicity, molecular toxicity profiles of your compounds. Or you can compare the molecular profile of your compound with molecular profiles of established drugs for instance, and try to get a feeling and understanding of the mechanism of action.
I'll demonstrate just with a few highlights showing a number of known compounds and how they behave in such an analysis. So this is an experiment that we've done for a customer where we can share the information with our customers. It's an experiment where basically 90 different compounds were included for screening with HTPathwaySeq and for replicates together with some vehicle controls and some compounds with a known mechanism of action. One of those compounds was TSA, a compound with a known mechanism of action, and on the right hand side, you can see the differentially activated or repressed pathways in the condition treated with the TSA compound versus the vehicle control. And so, as you can immediately appreciate from the list of activated and repressed pathways, it's clear that TSA molecular activity of TSA compounds is immediately revealed in the pathways that are activated or repressed upon treatment. With gene sets related to HDAC inhibition and gene sets related to treatments with TSA as a compound. So this is demonstrating that indeed you can retrieve molecular profiles, molecular signatures that are directly associated to the mechanism of action for your compound.
Not only can you look at impact of an individual condition on the molecular profile, you can also start comparing, for instance, different concentration ranges of a certain compound and evaluate relationships between the concentration of the compound and the differential activity of a gene set. And so these are just two examples, two compounds that were screened, where you can see that the two gene sets that are shown for the left compound are actually only repressed at the highest concentration. Whereas at the right hand side, you can see that for this compound, these are gene sets where you really see a very nice dose response relationship between the concentration that was administered and the differential activity of the pathway for the gene sets.
So this is the type of information you can confirm. What you can also do is compare compounds based on their molecular profile, and generate some sort of a similarity compound, similarity matrix, where compounds are clustered based on their underlying molecular profile. And so this can allow you, if you have certain chemical variations of a set of lead compounds or certain groups of compounds that have the same phenotypic effect but may work to a different mechanism of action or slightly different mechanism of action. This allows you to review this type of associations or similarities between compounds, going way beyond the more simple phenotypic readouts. So really using information from thousands of pathways and their activities to evaluate compounds similarity.
The number of applications goes way beyond what I can show you in this short amount of time. But finally what you can also do is single out those gene sets that reflect certain canonical toxicity pathways. That the number of canonical toxicity pathways that have been described in literature, so this is DNA damage, oxidative stress, heat shock, hypoxia, and so on. And so for each of these pathways, there exist signatures, gene signatures, that represents the activity of the individual pathways. So what we can do is in fact zoom in on those genes sets and evaluate how certain compounds are influencing the activity of these pathways in your cellular system. And this may reveal certain levels of toxicity that are unwanted and induced by one or multiple of your compounds, or that are induced at certain concentrations at which the compound is supplied.
Now, to help you browse through all of that information, because it's a lot of information, of course, we have developed a suite of data analysis tools that allow you to navigate and browse through the results, generates certain levels of visualization and provide a bit more interpretation and detail. And I will not do a live demonstration, but I will show some screenshots of the tool that we developed.
The tool is called Savannah, and it's a tool that is based on Shiny and is a RShiny application. And it allows you to explore your HTPathwaySeq data at a number of different levels. And some of them I already touched upon, so you can get more general results overviews, you can look at individual contrasts, toxicity, similarity, and so on.
Now I'll just show you some of the visualizations that are included. So this is a general overview of an experiment that had, in this case, 72 contrasts of interest. What you can see for each of the contracts, so each of the contracts here has a certain compound, a certain concentration, where you can see the impact, the overall impact on differential pathway activity. So this is the number of gene sets that are significantly activated or repressed when applying that compound. And so you can see that there's compounds that really have hardly any effect. You can see that there's compounds that have very strong effects.
And you can zoom in on certain sets of compounds by filling in some annotation information in the table on the left, and get a view on individual contrast, color code them, for instance based on their concentration. And then see, for instance, that this is a compound where you can see a nice dose response effect on the number of gene sets that are induced.
You can also explore individual contrasts in more detail. So this is a contrast viewer that allows you to select your contrast of interest and bring up all the results associated with that contract. First of all, so this is again our TSA compounds. So you get a table of the significantly enriched, both positive and negative gene sets, together with the FDR value and the enrichment score, which actually denotes the direction of the enrichments being posted for negatives.
And you can click any of these gene sets and actually bring up the detailed information behind the enrichment of that gene set, being the gene set enrichment analysis plot, which is shown on the left, which could be informative for those of you that are familiar with this type of representation, but more importantly, on the right hand side is actually the list of genes that are driving the enrichments. So this allows you to go into a very high level of detail and allows you to explore which genes in that gene sets are actually responsible for the genes that have been enriched, either positively or negatively. And so this helps you to better interpret and get more detail into the impact of the compound on the gene set and by extension the genes belonging to that gene set.
You can evaluate similarity between compounds. So this is a two dimensional representation using a t-SNE plot, showing all of the compounds included in this experiment and clustered based on their underlying molecular profile. So this is an entirely different way of looking at compounds. All of these compounds may induce the same phenotype, but molecularly may act slightly different. And so this allows you to see patterns. This is the similarity matrix, which you can also bring up when clicking on the top on the heat map, you have the heat map representation, but you can also bring up this two dimensional type of representation using a t-SNE plot.
You can look at the toxicity profile. So there's a toxicity viewer that actually focuses on those gene sets associated with canonical toxicity pathways, and allows you to explore which compounds, which conditions are potentially inducing pathways associated to those unwanted toxicity profiles.
Okay. So that's it for HTPathwaySeq. I hope I've been able to show you where we position this technology, how the technology works, and what type of information can be generated by this technology. Now, I want you to take it one step further and just talk a little bit about an extension of the HTPathwaySeq technology, which we called HTTargetSeq. Now, as I showed you before, the difference between HTPathwaySeq and HTTargetSeq is situated at the level of sequencing tests. By sequencing a bit deeper, meaning instead of one million reads per sample, we go to five million reads per sample. We get more coverage of the transcriptome, we get even more robust results in term of gene counts which gives us the ability to look at the differential expression status of individual genes. And this is something we believe can be of interest when working in the field of oligonucleotide drug development.
So for people that are familiar with the typical process of oligonucleotide drug development, you have antisense oligonucleotide or siRNAs for instance, as two examples of oligonucleotide drugs. The development process typically starts with hundreds of candidates directed against your target gene of interest, and then these are selected in a step-wise process based first of all, on their ability to silence your target gene, of course, but further downstream that selection cascade, it also becomes relevant for those oligonucleotide drugs that have high efficiency in terms of target lockdown is to start to establish their potential off target effects. So these oligonucleotide drugs are very short and they are notorious for their potential to also aspecifically regulate these so-called off target genes. So being able to map those off-target genes can help you to further streamline and fine tune the selection process. On the other hand, technologies or mapping the impact of certain chemical modifications or dosing or formulation of your oligonucleotide drug on the off-target repertoire could also be of interest.
Another application where identifying off-target genes from oligonucleotides, where that may be relevant is in a process where siRNA screens are actually being repurposed to identify novel therapeutic targets. So for those of you that are familiar with these siRNA screens, these large scale screens where thousands of siRNAs are transduced or transfected in a cell or cell line of interest followed by a certain phenotypic readout, and then you always have hits that are hits because of the silencing of the target gene of this siRNA. But you also have hits that are a hit because of off-target effects. And so by quantifying or identifying those off-target genes that also result in the phenotype of interest, you can actually start identifying new components of your pathway of interest and potentially novel therapeutic targets. And this is also something that has been done in the past and is still being done. There's a lot of information in those siRNA screens, those thousands of siRNA screens that have been done in various disease areas where the off-target based effects were always discarded, but actually there's a lot of potential information. And I will try to demonstrate the use of the application of HTTargetSeq in this area, where we have done a project for a customer that actually resulted in a lot of interesting information and I will try to explain it. So I will now focus on siRNAs, but remember that everything I say is also relevant for antisense oligonucleotides and is relevant in any type of context where the idea or the goal is to identify off target genes.
So for siRNAs specifically, off-target genes are driven by microRNA-like off-targeting, so the seed of the siRNA, which is actually the first nucleotides of the 5' ends of the siRNA, will bind primarily to the 3' UTR regions of target messenger RNAs, and cause off-target effects using microRNA-like off targeting. Now, there's a number of tools that are purely driven by bio-informatics analysis to identify candidate off target genes from siRNA, based on siRNA sequences. By really screening for the presence of the seed in the 3' UTR of messenger RNAs.
Now, these algorithms, these bioinformatics-based approaches, they have a number of limitations of course. First of all, there's a lot of false positive predictions. It's not because there's a match between the seed of the siRNA and the UTR of a target gene that the siRNA will effectively bind there and cause a knock down of the target gene. And secondly, there is no information whatsoever on the magnitude, the expected magnitude of the effects of that binding event, the potential binding effect. So in order to compensate for those limitations, you actually need to complement your bio-informatics predictions with gene expression data, and more importantly, with differential gene expression data, where you compare gene expression between cells. So again, this requires methods that are high throughput and cost effective.
One goal over the HTTargetSeq advantage is, again, is the same technology but just five times higher sequencing depth. You get a bit more genes and you get the option to do robust differential gene expression analysis.
So here we have the case study, so in this case, a customer had done on this RNA screen in the context of fibrosis, looking for a TGFB-driven phenotype. And so 1,617 genes were selected to screen for with three siRNAs per gene. So it was an siRNA screen, roughly 5,000 siRNAs. And hits were actually defined as genes that repress TGFB signaling, which was read out based on a reporter assay. So genes that repress TGFB signaling up and knockdown were defined as hits. So two types of hits you can get from such a screen, you can have hits like the one shown for gene A, where all three siRNAs directed against gene A would give a knock down of gene A and induce the phenotype of interest. So in that case, gene A was considered a modulator of the TGFB pathway. But there's also a lot of situations like the one shown for gene B, where all of these RNAs give you a knockdown of gene B, but only one of these RNAs induces the phenotype. Suggesting that the siRNA 3 for gene B is actually inducing the TGFB phenotype. True one or multiple target genes. And so by identifying those off target genes, you can get potentially new insights in the pathway, and by extension, reveal new therapeutic targets. As knocking them down, clearly induces the phenotype of interest.
So from these roughly 1,617 genes, 86 siRNAs were selected that were presumed, based on the scheme I just showed, to induce the phenotype of interest through off target effects. So these 86 siRNAs were screened with HTTargetSeq, together with two positive controls against the knownregulators of the TGFB beta pathway being TGFB receptor one and two. So these were screened with HTTargetSeq to reveal the off target genes. Just to show you that the HTTargetSeq indeed does allow you to get a very robust differential gene expression results. These are expression of the TGFB receptor one and two in the conditions treated with the respective siRNAs. So you can see clearly the knockdown of those genes. You can also see that you get a functional readout at the molecular level when we look at gene sets associated to the TGFB beta pathway, you can see that both for the siRNA against receptor one and receptor two, there's a lot of gene sets that are significantly down-regulated when treating themselves those cells with those siRNAs.
We did differential gene expression analysis for all 86 siRNAs, revealing hundreds of genes that were both up and down regulated. That were typically more down-regulated genes than up-regulated genes.
Now, I want to go into a little bit more detail on how siRNA off targeting works because it allows me to explain what you should expect to see if your data is of high quality, and if you are indeed measuring the off target effects. So if we look into the different ways microRNAs bind to their target genes, there are different types of seed matches ranging from the six nucleotide seed match, to two types of seven nucleotide seed matches, and then eight nucleotide seed match. And so what is known if that the efficiency, by which a micro-RNA down-regulates its target gene, increases from six to seven to eight. So with an eight nucleotide seed match giving you typically a more pronounced downregulation compared to a six nucleotides or seven nucleotide seed. As siRNAs use the exact same mechanism to down-regulate genes via this off target mechanism, we are actually expecting to see an association between the knockdown of a gene and the presence of one of these seed matches. And this is exactly what we saw in the data. What you can see in the middle plot is actually the cumulative distribution of gene expression, full changes for genes that do not have a seed match with any of the siRNAs. This is the black line. And then genes that either have a six-mer match, a seven-mer A1, seven-mer m8. So these are the two different seed sites here, and then an eight-mer seed match.
And what the central plot also shows is let's say the issue when only relying on bio-informatic based predictions. So all of the data points here in the right hand side of the plots are actually genes that are not down-regulated. And you can see that there's like 40 to 45% of all of the genes that have at least a one seed match predicted being a six and a seven or an eight-mer are actually not down-regulated. And there's only a very tiny fraction of genes with predicted seed sites that show a downregulation below two folds, so that are twofold or higher repressed. So if you would only rely on predictions, it's impossible to get information on the magnitude of the effect. You really need to have the gene expression data for that. And so this is exactly what HTTargetSeq can provide you with. And so the same is true for antisense oligonucleotide of course.
This is just the final slide of this case study demonstrating that the off target genes that we found actually made a lot of sense. So we pulled all of the off target genes that had at least one eight mer seed and that were down-regulated at least two folds, and then looked which of those were recurrently identified across those 86 siRNAs. And so if you then rank all of those genes, what was very interesting to see is actually the gene that was the second most recurrent off target gene was actually a TGFB receptor one itself, which made total sense, of course, in the light of the phenotype of the selection process. And several of these other genes, I cannot share their identity, but several, the over top-ranked genes were either known components, other known components of the TGFB pathway. But interestingly, also a number of genes that have not been associated to the TGFB pathways before. So suggesting that novel biology is actually revealed through this type of application.
I want to conclude here by reiterating the major advantage and applications of those two technologies. I think the major advantage is first of all, that you can apply those technologies directly on cell lysates. And that prevents a lot of work, a lot of cumbersome on the isolation work with potential batch effects that are being introduced because you cannot isolate everything at the same time and so on and so forth. So these technologies both work directly on cell lysates, all you need to do is you treat yourselves, we ship you a lysis buffer, you lyse yourself and you ship us your lysates. And then everything else is done on site by Biogazelle. And what you can get from that is very clear. You get differential gene and differential pathway expression analysis with data analysis tools and data analysis approaches and procedures that allow you to get the most of that gene expression data.
With that, I would just want to thank the people involved in the development of this procedure at Biogazelle. Also thank Galapagos, one of our customers that was so kind to allow us to share the results from the siRNA screen in fibrosis. And I thank you very much for attending this webinar, and I'm happy to take your questions. Thank you.
Well, thank you very much for that insightful presentation. Now I would like to invite our audience members to continue sending their questions or comments right now using the questions window for the Q&A portion of this webinar. Now, I've already received some of those questions, so I'll start with those.
Our first question here is, can this method also be used with tissues or bio-fluids?
That's a good point, the method is developed to work on lysates of cell culture, cultured cells. With tissues, we cannot apply it directly to tissues without doing an RNA isolation with tissues first. So this is really focused on cell lysates. However, if you are capable of isolating the RNA, and we are also capable of doing that, we can apply it to isolated RNA as well. Isolated RNA is present in 96 cell plates. We can apply them directly on purified RNA. So it's not restricted to cell lysates. It can be applied to RNA isolated from tissues. But one of the major advantages, of course, is the ability to apply it directly to cell lysates. But it can be applied to tissues, but requires an RNA isolation step. Liquid biopsies is a different story. The RNA in liquid biopsies, at least in our experience, is more highly fragmented and we have better methods to probe RNA levels, RNA abundance in liquid biopsies, than this one. So happy to discuss offline and provide more information if that's of interest.
Thank you. Our next question is, what are the sensitivity limits of the technique in terms of number of cells per sample?
So it depends on a number of factors. It depends on the cell type. It depends on the cell size. Mainly it depends on how much RNA you have per cell. So we've typically been working with more or less cell numbers ranging anywhere between 10,000 and 30-40,000 per well, over 96 well plate. There's always the possibility to go lower. In some instances, going lower may impact the sensitivity and so may result in fewer genes being detected. So there's always this relationship, of course, between input and sensitivity. But we've never experienced the issues in the projects that we've done so far with various different cell types. We never experienced issues with that. And the numbers have always ranged in that range. So 10, 20, 30 up to 50,000, really depending on the cell type, every cell type also requires a certain optimal density at which the cells need to be seeded and treated and cultured. So all of that information also needs to be taken into account when setting up an experiment like this. What we can always do is if there would be doubts about sensitivity, because cell numbers would be really, really low or much lower than what we typically use, is that we do a short, very small pilot upfront with just a few wells and evaluate the sensitivity of the method before applying it to hundreds of samples.
Thank you. Our next question here is, can you do this with non-standard cell types, e.g. Swine cells? Especially the GSEA is often difficult due to missing public data.
Yes. So there's no technical limitations to applying the technology itself in terms of the cell that you're using. The only thing that we've experienced only once out of the dozens and dozens of cell types that we have analyzed with this method, is that... And that's also something we always evaluate it from. We've had one cell type that we could not lyse using our lysis buffer. Or let's say we could lyse it, but the RNA was very rapidly degraded. That's the only technical limitation we have seen so far, but we've seen that with one cell type out of dozens that have been evaluated. So I would say you can consider that any cell type, so there's no issues there. In terms of gene sets and the relevance, that's another story, of course. Indeed, there you are, let's say, limited to what is available at least publicly, but you can add as many custom gene sets as you like to the analysis. So if based on literature in a certain disease model or a certain cell type or a certain phenotype you can assemble yourself compared to what is available in the public domain, this can very easily be added to the analysis, it can also be explored in exactly the same way. Having said that, the gene sets enrichment analysis approach and the gene sets available in the Molecular Signatures database, those genes sets are pretty extensive across canonical pathways. But there's always the option, of course, that you're working with a certain cell for which not a lot of public information or gene sets are available. And you may want to consider adding a few custom ones that are of interest for you.
Thank you. Our next question here is, what's the definition of proper expression? Is it the average of gene expressions in that pathway? Is it upregulation or downregulation taken into account?
Yes. What we typically report as a standard output is for each of the pathways, you get an FDR value telling you if there is a significant enrichment of genes within the pathway among the genes that are up or down-regulated in the condition. And on top of that, you get an enrichment score denoting whether the pathway is enriched among up-regulated genes or enriched among down-regulated genes. Now that's the standard output, on top of that, using the tool that I showed you, you can really browse to every pathway that is enriched and look at the individual genes within that pathway to see what the direction is they are going to, what the magnitude of the change is. And any additional analysis you would like to do using that data, let's say you want to generate a mean expression of genes as a pathway score or a score based on rank, some approach or whatever. All of this can be done as a custom analysis. So there's no limitations in how to look at the data. There's a number of things that are included in the standard readout, and there's virtually unlimited number of analysis you can do on top of that.
Thank you. Our next question is, what is the minimum and maximum number of samples that you can process with HTPathwaySeq?
Yeah. So a classic experimental setup includes 94 samples with four replicates and two internal controls also in replicate. So 384 samples in total. This is the standards experimental setup. The most cost effective one. You can go to multiples of 384 samples without losing efficiency in the cost. If you go to less than 384 samples, this is possible. Then we go to multiples of 96. But then it's less cost effective, so you get a slightly higher per sample cost, of course, by reducing the numbers because some of the work that needs to be done doesn't change a lot, whether you do 96 samples or 384. So that's why it becomes a little bit less cost effective than doing 96. But it is feasible. We don't go lower than 96.
Thank you. Our next question here is, does the system work for cells cultured in 3D such as spheroids?
Yeah. It's a very good question. I think we have a number of those type of projects that are now in the pipeline and what we typically do if there is, again, uncertainty about the one, the lysis, and two the sensitivity. We always do this small upfront pilot. And there's only two things that we need to verify. We need to verify that the cells are efficiently lysed. And that the signal, the RNA signal we can pick up from the lysate is sufficiently high for the method. These are the only things, the only two things we need to check. And this is a very small pilot experiment that doesn't even require sequencing. We use qPCR to do that evaluation on a very limited number of samples. And in a 3D culture system, you could ask yourself, well, is every cell accessible for the lysis buffer? Will every cell get efficiently lysed? So this is something that would need to be checked. And second, how many cells do you have in every well? Is it a tricky culture system where you only have like one spheroid consisting of X number of cells? Then the question is, is X high enough to be sufficiently sensitive? But again, these are the two things that we check upfront before rolling out the technology and applying it to hundreds of samples.
Thank you. Our next question is, do we need to put the same number of cells in each well? How do you deal with variation and sample concentrations across different wells?
So technically it's not required. You can, if you want to evaluate, let's say, the impact of the compounds in relationship to cell density. For instance, if that would be a question, then this is perfectly feasible. And there's, of course, a lot of data normalization steps involved to deal with the technical variability starting from differences in density all the way down to differences induced during the library preparation and sequencing. So normalization, we normalize at various levels during the workflow to get rid of these types of differences. But technically, there is no absolute requirements to have absolutely the same number of cells in each of the wells.
Thank you. We'll take one last question here. I'd like to ask regarding the sensitivity for primary human T-cells, and what's the minimum number of cells required for primary human T-cells?
Yeah. I can only speculate on the minimal number of cells required. It depends on, as I mentioned before, a number of factors, what is the RNA content, how active are these cells? You also need to take into account, are these cells still dividing during the treatment or not? You could seed 5,000 cells and then harvest or lyse at 72 hours, but maybe have three cell divisions in between. Are cells dying from treatments? This is also something you may want to take into account. You can start with 10,000 cells at time point zero, treat and only have 1,000 cells left because you're killing the cells. Also, this will have an impact. So it's really on a case by case basis that we discus this typically with customers, the experimental setup, what is expected from the treatments, what are the typical, what are the ideal culture conditions for your system of interest? This is also something you need to take into account. You cannot have your cells overgrow each other and have an impact on the phenotype. Also, molecularly, that you don't want to see just to get to the numbers that you would need to get to, to reach optimal sensitivity. So these are all things that need to be looked at and discussed with the customers on a project by project basis. I cannot tell by heart for primary T-cells, again, of small pilots where we do two or three densities and a quick qPCR lysis and qPCR readout to evaluate an optimal cell density taking into account also all the parameters that you need to take into account for the phenotypic readouts. That would be my advice then, if you have no idea what your optimal cell density would be, to do a very small scale pilot to map it. We have done primary cells before, not T-cells but with other immune cells, without any issues and with a small pilot upfront and this can be established relatively quickly.