- Knowledge center
RNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify total cell-free RNA content are rare.
In this webinar, kindly hosted by Takara Bio, Prof. Jo Vandesompele, CSO at Biogazelle, discussed the performance of the SMARTer Stranded Total RNA-Seq method and showed:
Good morning. Thank you for joining this Takara Bio webinar. I am Matthieu Pesant, NGS specialist at Takara Bio. And today we are very honored to have Professor Jo Vandesompele giving this webinar on Total RNA Sequencing of Liquid Biopsies. Jo is the co-founder and Chief Scientific Officer at Biogazelle, which is a CRO specialized in high-value genomics applications that support pharmaceutical research, clinical trials, and diagnostic test development. He is also a Professor in functional cancer genomics and applied bioinformatics at Ghent University in Belgium. Jo is the author of more than 250 scientific articles in international journals, including some pioneering publications in the domain of RNA quantification and non-coding RNA, and he has an H-index of 66. So we are very honored to host this webinar for Jo Vandesompele. During the webinar, if you have questions, please type them in the question box, and we will answer them at the end of the webinar. So now I'm handing over to Jo for the presentation. Welcome, Jo.
Thank you very much, Matthieu, and thanks for having me. So first of all, I would like to wish you and your colleagues and your families well in these challenging COVID-19 times. And while I am really looking forward to this webinar, the timing is somewhat unfortunate because at Biogazelle, we're in the middle of setting up a large-scale SARS-CoV-2 qPCR testing capacity for the Belgian government. The only link between this statement and my talk is the fact that both deal with human biofluids. Before I start, I would like to thank my colleagues at Biogazelle and Ghent University and also Takara Bio for hosting this webinar and for giving us pre-launch access to the Pico v2 kit for total RNA sequencing for low-input fragmented RNA. I would also like to explicitly acknowledge many people from different consortia, like the Human Biofluid RNA Atlas, the Extracellular RNA Quality Control Consortium, the collaborators of the murine PDX experiment that I will discuss, and the co-authors on our Total RNA Sequencing method paper published in Scientific Reports last year.
Last month, I did a quick literature survey on the liquid biopsy space, and the analytical liquid biopsy field is clearly dominated by DNA applications, as you can see by the large green square top right, with 52% of the articles studying DNA, of which five specifically looked at DNA methylation. RNA is clearly lagging behind with only 16% of the studies focusing on RNA. And within that group of publications, mainly microRNAs, the light pink part, are studied at about two-thirds of the cases studying RNA.There may be several reasons for the observed lack of enthusiasm for the analysis of RNA in liquid biopsies. One is the incorrect belief that RNA is completely degraded and cannot be measured. Please pay attention to the fact that cell-free DNA is also fragmented to about the size of 160 nucleotides with short half-lives in circulation of about a few hours. Secondly, the lack of methods for true, precise, and sensitive measurements of fragmented RNA. And three, the much larger dynamic range of RNA abundance, with four to five orders of magnitude, making profiling of RNA technically much more challenging.
When we look at the size distribution of the purified extracellular RNA fragments from human blood plasma, it is clear that the majority are short fragments with a peak size of about 90 nucleotides. Here you can see the FEMTO Pulse electropherogram of RNA purified from platelet-rich plasma from a healthy donor using the miRNeasy serum/plasma kit, where we loaded the equivalent of about 30 microliters platelet-rich, plasma-derived RNA. Also, note the likely presence of intact ribosomal RNA, presumably coming from platelets, which are known to contain high concentrations of RNA. People often ask me about the stability of RNA, but there are several angles to that question. So do they mean the in vivo half-life of the extracellular RNA space? Do they mean the RNA, the stability in the collection tubes or in the stored biofluid or once purified? For the latter, the purified RNA, we know that RNA is extremely stable if properly treated. When stored as a fluid, it is recommended to avoid repeated freeze/thaw cycles. But in general, it also appears to be relatively stable in the fluid. For the tube, I will show results later that the time in between the blood draw and the plasma preparation is important. And finally, the in vivo stability of extracellular RNA is largely uncharted terrain. I also want to mention that RNA is likely also protected by different mechanisms, such as enclosures in vesicles like exosomes, micro-vesicles, apoptotic bodies, or platelets, or bound to lipoprotein complexes, such as Ago2 or high-density lipoproteins. It is also likely that naked RNA and especially circular RNA is freely circulating. Circular RNA is a relatively novel class of RNA, so originating through back splicing of otherwise linearly spliced transcripts. And as the circular RNAs do not have free 5' or 3' ends, they are largely resistant towards exonucleases.
Of course, to handle this type of extracellular RNA, one needs sophisticated methods. And Biogazelle is a service provider, focusing on RNA biomarker discovery and validation and has developed RNA sequencing methods to run in its quality-controlled lab to handle any type of RNA molecule. Here you see a schematic drop of fluid with different RNA biotypes. On the right hand, we have library preparation methods for small RNAs. And on the left side, you see various methods to sequence either all long RNAs, also termed total RNA, or the 3' ends of polyadenylated genes or hybrid pro-capture-based target sequencing of all messenger RNAs or lncRNAs. Today, we're going to focus on a method called total RNA sequencing that has the ability for unbiased sequencing of all long RNA fragments.
Of note, total RNA sequencing is, in fact, a misnomer. While virtually all RNA is sequenced, this is not true for the small RNAs like microRNAs. Importantly, the biggest challenge of total RNA sequencing is the removal of abundant and unwanted ribosomal RNA that can make up to 90% or more of the RNA. So the method that we use for total RNA sequencing in the lab, both at the University and in the company Biogazelle, is Takara's SMARTer Stranded Total RNA Sequencing Kit, Version 2, Pico Input Mammalian. A mouthful, from now on, I will refer to it as the Pico v2 Kit. It can effectively turn both linear and circular RNAs into a sequencing library and works on highly fragmented RNA and can handle both polyadenylated as well as non-polyadenylated transcripts. Here you see the schematic of the method, and it starts with reverse transcription at the top left at the 3' end using a tailed random primer, whereby a template switch oligo at the 5' end extends the RNA molecule with a known sequence. A very clever procedure. The template switch oligo is an oligo that hybridizes to untemplated C nucleotides that are added by the reverse transcriptase during reverse transcription. And the resulting cDNA then undergoes limited cycle PCR, incorporating Illumina adapters and barcodes. After a few cycles of PCR, we applied the ZapR proteins and probes to specifically degrade the ribosomal RNA cDNA, which you can see on the middle of the right panel, after which the intact cDNA fragments, which are the cDNA molecules of interest for targeted RNA molecules of interest are further amplified. They're cleaned and made ready for sequencing. And we typically sequence between 10 to 20 million reads of various biofluids.
So when producing a new method in the lab, one needs to carefully assess its performance, but that's, of course, easier said than done. So while there are numerous metrics, including ease of use, scalability, cost etc, we believe that precision, analytical sensitivity, quantitative accuracy, and of course, fit for purpose are paramount. You don't want a method that is not robust nor reproducible. You do want the detection of a high number of genes, this is the analytical sensitivity, and of course, excellent quantitative accuracy, assessed by trueness or correspondence between the expected and the observed fold changes or correspondence between expected and observed RNA concentrations.
The Pico v2 method, as you can see here, is characterized by excellent repeatability. Here you can see RNA isolation replicates of platelet-free plasma from an EDTA blood collection tube. Please pay attention that this is the total workflow variation that you see because we're including RNA extraction replication, library preparation, and sequencing. In this scatter plot, the green dots are the genes that we define as reproducibly detected above a threshold of at least four counts, and the blue dots are genes below that threshold. This detection cut-off is data driven, and it removes 95% of so-called single positive replicate data points that only show up in one of the technical replicates, as originally proposed in our Nature Methods paper on the microRNA Quality Control study. The Pearson correlation coefficient, as you can see, is quite high of about 0.948, and typically, 5,000 to 8,000 genes are reproducibly detected, depending, of course, on the type of biofluid, the volume of fluid in the input, and the sequencing depth.
Because of the challenging nature of biofluids, because of low input and varying RNA content and its degradation status, and, of course, also because of processing-induced variability of RNA extraction and library preparation, spike-in RNA controls using synthetic RNAs are crucial. We typically add a complex set of 78 sequin spikes to the lysate prior to RNA extraction. These mimic the variability undergoing during RNA extraction. And we add 92 more probably well-known linearly-spliced simple ERCC spikes during library preparation. And this combination of spike-in controls over a relevant dynamic range enables us to measure both RNA purification efficiency and determine the relative or absolute concentration of RNA per volume unit of fluids. Important to note is that the fraction of reads mapping to spikes scales inversely proportional to the endogenous RNA content. And we believe that normalization against spikes added to the lysate proportional to the volume of the fluid is the most meaningful way to measure RNA concentration per volume of fluid. And by assessing the sequins over ERCC ratio, we can effectively determine the extraction efficiency, which are crucial metrics to understand potential issues during the workflow.
We can also use these spikes to create artificial RNA samples with built-in truth to assess quantitative accuracy. Remember, one of the metrics that we are fond of to assess the performance of a method. Here we added different spike-in mixes in varying amounts in a background of human plasma from a healthy donor. Sequin spikes and ERCC spikes were diluted in opposite order, as you can see on the left, in a biologically relevant four-fold dynamic range for both the sequin and ERCC spikes. We purposely selected a narrow range to determine the method's ability to detect smaller differences. It's always easy to detect big differences. And of note, both sequin and ERCC spike mixes consist of multiple RNA molecules, 70 to 90, as shown on the previous slide, at different concentrations. Overall, as you can see on the right, there is a very strong correlation between the expected and the observed fold changes in the synthetic RNA samples with built-in truth, with Pearson correlation coefficients of 0.883 and a slope of about 1.0. And this is quite good considering that the spikes were added at very low concentrations, similar to plasma RNA.
Fit for purpose, what does that mean? Well, I don't have any metrics or tables I want to share today, but it means that you have good mapping rates, which, with this method, really depends on the biofluid type, because also the varying presence of exogenous RNAs, such as RNA from viruses or bacteria, of course, will reduce the human mapping rate. We also want good strandedness, and this method can handle it. It shows there is no remaining DNA contamination in our RNA eluate, and the method allows to differentiate sense and anti-sense overlapping transcripts, which is key because we know that the human genome is pervasively transcribed with more and more genes showing the sense/anti-sense overlapping nature. There's a low level of nuclear ribosomal RNA reads, so the ZapR protein and the probes really work. But sometimes we notice an excess of mitochondrial ribosomal RNA, which is sometimes problematic, but this depends on the biofluid type. So the method may need a little bit of improvement to remove mitochondrial ribosomal RNA. There is a sizeable intronic read fraction, which allows us to assess post-transcriptional regulation, and it allows the detection of both polyadenylated and non-polyadenylated transcripts. And excuse me for the typo here. Also, long non-coding RNAs and circRNAs are effectively quantified in addition to protein-coding mRNAs.
I think most of us agree that liquid biopsies are an important tool to reach precision medicine's goals. But did you also know that at least 20 different human biofluids exist, and they all contain cell-free RNA in varying degrees? In a joint effort between Ghent University and Illumina, we have created a Human Biofluid RNA Atlas. That's an attempt to deeply probe into the extracellular transciptome of 20 different biofluids.
And when we apply our methods to these 20 biofluids, we see the following picture of relative RNA concentrations per unit of fluid volume. There's a striking difference of more than a thousand-fold in RNA concentration among the fluids, where aqueous humor, CSF, cerebrospinal fluid that is, and sweat are among the fluids with the lowest RNA concentration. And breast milk, seminal plasma, and tears among the fluids with the highest RNA concentration. We're currently investigating how well we see RNA signals from the producing or transporting tissues and cells in these fluids and which fluids could be used for diagnosing or monitoring human diseases.
And as one example, I want to show this result. I do understand that plasma is the most popular liquid biopsy fluid, but it may not always be the most relevant one to study a particular disease. Here you can see pilot data from a sequencing study on 200 microliters of matched urine and plasma from 20 high-risk or de novo metastatic prostate cancer patients and controls. And it's very clear that all prostate tissue or prostate cancer-specific genes are much more abundant in urine compared to plasma, as depicted by the more intense reddish color in the urine samples. So it may just be much more appropriate to profile urine, in this context at least, to study prostate cancer.
There is an emerging literature on blood plasma RNA profiling. However, the pre-analytical steps are widely different. We did some preliminary testing in the lab and immediately noticed huge effects depending on, for instance, the RNA extraction kit that was used, the library prep method that was used, or the blood collection tube that was used. We therefore randomly selected 100 publications from the last two years and assessed how well the authors described presumed important pre-analytical variables, such as the blood collection tube, the plasma preparation SOP, etc. The conclusions were astonishing, that less than 10% actually report the variables that, in our eyes, seem important to understand and replicate the study.
This prompted us to initiate a large-scale collaborative study among Ghent University, Illumina, and Biogazelle, coined The Extracellular RNA Quality Control Study, or in short exRNAQC. Here you see that study in three phases, where we have just conducted the first part. It's an almost two-year effort with more than 10 researchers trying to meticulously study these pre-analytical variables. I will only show you a few cases or a few glimpses of results, mainly focusing on the impact the RNA extraction kit has or the blood collection tube has on the results.
Here you see a summary of the results, and we have many different performance metrics, but I only show here the number of genes that are reproducibly detected in platelet-free plasma from human healthy donors. The input varied from 0.1 to 5 ml of plasma per the kit recommendations. The eluate volume was also widely different from kits, eluting in only 14 microliters up to 100 microliters. The method that we used here was not the Pico v2, but messenger RNA capture sequencing, but the results would have been very similar. And we see 24 million reads paired end. Here we applied the 5 read cut-off, and you see that some kits go up to 12,000 genes reproducibly detected, while others stick at 2,000 to 3,000 genes. So there is a 138-fold difference in RNA concentration observed, a 30-fold difference in RNA yield, and a six-fold difference in the number of detected genes, only depending on the RNA purification kit. So you'd better want to make sure you work with the right kit for your application. Of note, the differences that I show here are much less pronounced when you're studying microRNAs. We assume that this is because the kits are probably optimized to study microRNAs because most of the suppliers do not realize until recently that also mRNA is abundantly present in biofluids, but their kits have not been very well optimized to study larger RNA fragments.
And the second part of some data I want to show you in the Extracellular RNA Quality Control Study is the assessment of blood collection tubes and the time-to-process. In this study, we drew blood from three healthy donors. We assessed three different time points with 10 different tubes: a classic serum tube where the blood is coagulated, four non-preservation tubes, two EDTA and two citrate tubes, and then five so-called preservation tubes that claim that you can have the blood for up to one week at room temperature before you process. We did deep sequencing of both total RNA and small RNAs, and we'll present some results on the total RNA.
A striking observation was that a serum tube typically recognized or considered as not suitable for RNA applications, especially not for large RNA like mRNA and total RNA, and EDTA plasma tubes are very similar when it comes to RNA biotype composition when we look at total RNA, where we see that among the donors, among the time points, the distribution of the measured RNA biotypes ranging from protein-coding genes or lncRNA genes, etc, are very similar. So this, at least for us, indicates that serum-derived RNA can effectively be used for liquid biopsy-based RNA profiling. The picture is very different when we look at microRNAs, where indeed serum has a widely different RNA repertoire compared to EDTA with tRNA fragments and pvRNA suddenly emerging in serum and not in plasma.
To make a very long story short, we assessed various metrics, and the most important criterion to select a tube for the second phase of the study was stability of performance, meaning that it doesn't matter how long you wait before you do the blood draw and the plasma preparation and still have or obtain similar results. Here I show you the results for five important metrics that we believe are crucial that should remain stable over time, such as reproducibility of gene expression, the distribution of the different RNA biotypes, the number of detected genes, the RNA concentration, and the lack of hemolysis. Hemolysis is a phenomenon whereby red blood cells burst and lyse and the RNA content of the cells come into circulation or into the plasma, which you don't want. What is clear is that these so-called preservation tubes actually don't perform very well, where you see that their mean fold-change over time of these important performance parameters dramatically increase or change over time, while it is not the case or much lower for these non-preservation tubes: serum, EDTA, and citrate.
In conclusion for that part, I think it's not prime time yet for blood collection preservation tubes for various reasons that I've already mentioned. There's compromised precision, low and varying RNA levels over time, different RNA biotype composition over time, problems with some tubes to remove contaminating DNA probably because of inhibition of the DNase, and in general, higher and increasing levels of hemolysis. If you would ask me right now what tube you would recommend for RNA-based liquid biopsy studies, we would say quickly prepared serum or EDTA plasma, ideally prepared within four hours between blood draw and plasma preparation.
To end my presentation, I want to go over three different cases, case studies where we used the Pico v2 kit or liquid biopsy profiling. The first one is the assessment of the tumor-educated platelets as a novel concept in liquid biopsies. Some of you may be familiar with the concept of tumor-educated platelets as published in several top-rank journals. In essence, it was shown that, on the one hand, RNA from the tumor may end up in platelets via vesicle-mediated transfer, and on the other hand, the splicing pattern of platelet RNA is different once exposed to tumor. To better understand the possible importance of platelets to study tumor-derived RNA because much of the work we do is focusing on precision oncology, we are currently conducting several studies. Today, I want to show you the results of a first proof-of-concept study in which we analyzed blood from five PDX mice with a breast cancer tumor. Upon killing the mice, we succeeded in preparing from each mice only 70 microliters of either platelet-free, poor, or platelet-rich plasma with increasing concentrations of platelets as depicted by the more intense yellowish color. Obviously, we also added the spikes, as explained before, before and after RNA purification and performed total RNA sequencing.
I believe this is quite an elegant experiment as the total RNA sequencing method, so the Pico v2, enables us to sequence both human and murine RNA at the same time. And using the spikes, we can effectively quantify in which plasma fraction the highest human RNA signal resides to assess for ourselves whether it makes sense to study platelets in a liquid biopsy context for precision oncology. Please note that this is technically a very challenging experiment, working with only 70 microliters of plasma and using only half of it for the library prep and mapping on two different genomes simultaneously. Considering there is quite some homology between human and mouse RNA, this is quite challenging. Here you see that there are substantial differences among the mice. Each line is a different mouse. In the middle panel, you can see that the RNA content or the total RNA content is surely increasing as a function of platelets levels, with the higher RNA concentration in the platelet-rich plasma, as expected, because platelets are known to contain a lot of RNA. On the left, you can notice lower fractions of human RNA in function of the platelet level, in line with our hypothesis that platelets contain, again, a lot of murine RNA. Most importantly, on the right, we again observe huge differences among mice, but it seems that the levels of human RNA do not substantially increase with platelet levels, if not decrease. According to us in this pilot experiment, this suggests that the majority of human tumor-derived RNA is effectively outside platelets. As this was only our pilot study, we have confirmed now these results in another murine model with cellular xenografts. Stay tuned for more updates.
A second case study is the application of the Pico v2 method to study matched fluid and the derived extracellular vesicles. We definitely want to use the term extracellular vesicles in contrast to exosomes because exosomes infers some information on the biogenesis of these vesicles, which is often not true. My colleague, An Hendrix, used density and size-based purification methods to purify EVs from fluids. As fluids, we used conditioned medium from breast cancer cells growing on plastic plates, platelet-free plasma collected from a citrate tube from a healthy donor, and then urine also from a healthy donor. What you can observe is that there are varying biotype contributions not only among the biofluids, but also between the biofluid and the derived EV or the extracellular vesicles.
If we look at the RNA concentration using the spikes and corrected for biofluid input volume, because for purification of vesicles you need much larger volumes, 2 up to 45 ml, for instance, of urine for purification of vesicles, while for the total fluid we only used 200 microliters, so this is volume-corrected RNA concentrations in a log scale. And you again observe huge differences among fluids, but also between fluids and EVs. For instance, the green dots, conditioned medium and conditioned medium EVs, there's a more than thousand-fold concentration in total RNA in these vesicles compared to the pure fluids. The same is true for the... But the difference is a little bit less pronounced in platelet-free plasma, versus the platelet-free plasma-derived EVs. But surprisingly, in urine the difference is much less pronounced, where the RNA concentration in the vesicles is almost the same, well, one order of magnitude lower compared to total urine. So it seems that not only the method is capable of studying the RNA cargo in vesicles, but also that it provides first glimpses in differences among fluids and differences among fluids and their derived EVs.
We did some further exploration to study the differences between the RNA cargo in the fluid and the EVs, and we see striking differences. At the bottom, you see the Venn diagram overlap between the detected genes in pure urine and urine EVs, and you see that the majority of the genes are detected in both compartments with also very good reproducibility, as exemplified by the scatter plot on the right. On the top, you see a completely different picture. This is human platelet-free plasma, where the majority of the genes that are detected in the EVs are also detected in the fluid, in the pure fluid, but the pure fluid contains many more RNAs. And also the reproducibility or the concordance, I would say it's not reproducibility but concordance, between RNA cargo in EVs and fluid is markedly different, suggesting a potential specific sorting mechanism of particular RNA molecules or pathways into EVs. This is, of course, speculation. More research is needed to better understand this phenomenon.
And the last case, I want to show you colon cancer in the metastatic setting, where we had access to samples from a longitudinal study. Small numbers, a pilot study, but I think it proved the case and the utility of extracellular RNA profiling in precision oncology. We have three patients collected at different time points. This was platelet-poor plasma, and the patient received chemotherapy and anti-VEGF or anti-EGFR therapy.
One of the questions that we asked is how does the RNA in the plasma change when patients receive chemotherapy? What you see here is a volcano plot of the resulting RNA patterns at the time of CT scan when the patients have received the chemotherapy versus the initial diagnosis, and you see many large differences. The most significant difference in terms of significance and magnitude is the POLB gene. It's the most up-regulated gene in patients on the time points at which the patients receive the chemotherapy. In the recent literature, this gene is known to be involved in mismatch repair. It's known to be up-regulated upon chemotherapy in tissues and cells, bear in mind we're looking here at plasma, and it's mutated in colon cancer. So it absolutely makes sense that we see this dramatic up-regulation of this RNA in the plasma from patients treated with chemotherapy. You might say, "Well, chemotherapy is changing so much. Can you also discern more subtle relevant patterns?"
So we did take a look at the signals or reflections of pathway inhibition upon treatment with anti-VEGF therapy or VEGF inhibitor. Bear in mind that this was only one patient, and it clearly shows the power of an enrichment analysis. So what you see here is that we provide bioinformatics evidence that the VEGF pathway is inhibited or less abundant in the plasma when patients receive anti-VEGF therapy. So this hints at a putative pharmacodynamic biomarker when looking at extracellular RNA in patients treated with a targeted agent.
Of course, you can do much more detailed analysis, and we see all kinds of interesting patterns emerging, such as differences in DNA mismatch repair, proliferation and anti-proliferation signals, T-cell stimulatory signals, inflammatory response, etc, and they all make sense in the context of cancer or cancer treatment. Importantly to note is that by looking at extracellular RNA from liquid biopsies, we don't only look into the tumor-intrinsic RNA, but also have glimpses into tumor-extrinsic differences, such as how the immune system reacts to the tumor or the treatment or if the treatment gives some particular toxicity to organs. If a treatment creates, for instance, a toxicity to the heart, well, heart cells will burst and lyse and die, and the heart-specific RNAs will enter circulation, and you can pick up that signal of cardiac toxicity by an increase in RNA coming from heart-specific RNAs.
To end my webinar today, I also want to show you the utility of the Pico v2 SMARTer method outside liquid biopsies. Last year, we published in Nucleic Acids Research that the method can also be used to study single cells, as it is so sensitive. So we sequenced about 450 cells, 1.5 billion reads. Encouragingly, less than 3% of the reads were mapping to ribosomal RNAs, indicating that ribosomal RNA removal using the ZapR proteins and probes worked very well for single-cell RNA sequencing. We did notice 20% of intronic reads of immature unspliced RNAs, and this really enabled so-called velocity analysis. And velocity analysis is a bioinformatics approach that allow us to study expression dynamics by looking at mature and immature RNA. With one million reads per cell, we detect more than 5,000 genes by at least four reads in a cell, and that's typically much larger than other methods, the majority being messenger RNAs, but also novel genes, polyA-minus genes, so without polyA-tail, and circular RNAs, that escape detection by more conventional RNA sequencing methods.
And finally a last slide on the utility of the method to study FFPE tissue, where you see excellent repeatability, you've shown for the same sample using 10 or 30 nanograms of input showing that we do have the expected fractions of RNA reads mapping to exonic regions, intronic regions, and intergenic regions, and that we do detect in a reproducible manner a large number of protein-coding genes as well as long non-coding RNA genes and other genes that would escape detection if you would defer to non-total RNA sequencing methods. Bear in mind that in FFPE, because of the nature of these challenging samples, we do observe a remaining ribosomal RNA fraction of about 10 to 17%, but this is definitely on par with other methods.
With that, I would like to conclude my webinar. I want to state that or I hope I convinced you that all human biofluids contain RNA and likely reflecting health and disease states. We developed and benchmarked the SMARTer Pico v2 for extracellular RNA profiling. Bear in mind that optimization and standardization of pre-analytical steps remain key to success. And the few case studies that I presented hint at early clinical validity of extracellular RNAs, amongst others, as a potential pharmacodynamic biomarker. As a last statement, I want to say that we consider SMARTer Pico v2 as a kind of Swiss knife, because it enables us with the same workflow to study single cells, FFPE tissue, and biofluids. And with that, I thank you for your attention, and I'm happy to take questions.
Thank you very much, Jo, for this very nice presentation. Very inspiring. So now we will take some time to answer your questions. So starting with the first one:
What is the role of cell-free RNA in biofluids?
Excellent question. It's not entirely clear yet. While RNA has a documented role in intercellular communication as this so-called new class of hormones, we can of course not rule out that some or perhaps all of the RNA or the majority of the RNA simply reflects cellular waste or a by-product of other processes like dying cells. However, I want to indicate that whatever its function, it can effectively be exploited as a biomarker, which according to many schools, you don't need to understand where it's coming from or what it does as long as you can show specificity and sensitivity of your biomarker. I hope that answers the question.
Yes, thank you. Another question: Would you recommend to study platelet-free or platelet-rich blood plasma?
Well, I briefly discussed the tumor-educated platelet hypothesis, and upon reading these papers, we were originally convinced that it was better to study platelet-rich plasma, at least in a precision-oncology setting because we want to study the tumor, and if the tumor releases the RNA that ends up in the platelets, you want to have as many platelets as possible. However, our two pilot experiments indicate that the RNA concentration of the tumor in the xenograft model, which is an elegant model to study that, is not necessarily higher, if not lower, in platelets. So this really prompted us to reconsider our minds, and at this stage, we favor platelet-free plasma as we now have a much better view of all different cell types and organs that release RNA into circulation, and not necessarily zoom in so much on platelet-derived RNA.
All right. Next question: What is the RNA concentration in the liquid biopsy eluate?
To be honest, I don't know, because we never measure the RNA concentration in the eluates. We have stopped doing that. So most of the work we do is using 200 microliters of, say, urine or plasma, and the typical measurement method, such as absorbance spectrophotometry as well as fluorometric measurements, are not sensitive enough for a reliable, robust measurement. So we simply spike during extraction and at the RNA eluate, and we use the spikes to monitor our process and to correct for variables in RNA input amounts.
Okay. Another question: I use Trizol coupled to Monarch RNA kit to isolate EVs' RNA from pure fluids, and after the isolation of RNA, I precipitate with ethanol. I get good qPCR results, but the 260/280 ratios are very low. How can I solve this?
The bold answer is don't worry. You don't need to solve that problem because it is not a problem. I never look at 260/280 ratios for various reasons. One is this ratio is very insensitive to contamination of proteins or DNA. You really have to supply large amounts of proteins or DNA to affect that measurement. More importantly, that ratio is only meaningful if it's measured in pH-neutral solutions. So if, for instance, you would re-elute your RNA after precipitation or even if you elute it from a column in water, it's known that water becomes acidic over time, and then these ratios will change. In my experience, these ratios have never been predictive for quality for either qPCR or RNA sequencing. So I would reiterate don't worry.
A question on a future project on non-small lung cancer. Can I perform NGS from biopsies of this cancer type and validate the NGS data for microRNAs by using a plasma sample? And there's an associated question. I would use RT-qPCR to validate those microRNA expressions.
Good question. There's not so much literature on matched tissue, if I understand well, the biopsy tissue profiling and fluid profiling. As a coincidence, we have done similar studies in the field of esophageal cancer, where we had the fluids, the plasma, and the matched diseased and healthy tissue from the cancer or the pre-cancerous lesions. We're in the midst of investigating that, and we cannot make any firm conclusions, but of course, you can study that, and you can perfectly combine sequencing with qPCR to confirm. And whether you're focusing on microRNAs or long RNAs, I think it doesn't really matter. We have a small preference for the longer RNAs using this total RNA sequencing methodology because the number of biomarkers we obtain is typically one to two orders of magnitude larger, with microRNAs typically around 400 to 600 in a liquid biopsy and messenger RNAs 6,000 to 10,000. What we have done as a final comment to that question is we have preliminary evidence that the mutant RNA from the cancer can be detected in the plasma. We cannot make any statements on the sensitivity of using plasma to also detect mutations and other structural variants, but it is possible. I hope that answers the question.
Thank you, Jo. A question on what is the percentage of circulating RNA both cell-free and bound in vesicles? How do their stabilities compare?
Well, stabilities we don't know. We haven't done any stability analysis. We are planning to do SLAMseq experiments, where we will do metabolic labeling and see over time how new nascent RNA ends up in the circulation, and then if we stop the metabolic labeling, it disappears. And upon doing fractionation, like purifying vesicles, looking at RNAs bound to proteins or freely floating, we will be able to do a more thorough assessment of the stability. When it comes to the RNA concentration, that was also part of the question if I understood it well, it's our understanding that the majority of the messenger RNA, I want to make it clear messenger RNA, is outside vesicles. The large majority is not contained into vesicles. I've shown a graph. Perhaps I can go quickly back to that one, where I showed you a 1,000 to 10,000-fold difference in RNA concentration between vesicles and the fluids, and that really indicates to us and because this is correct, that using spikes, this is the picture, and standardized per volume unit, and you see that simply the concentration of total RNA is much lower. The picture is slightly different for microRNAs, where our results indicate that microRNAs are much more abundant in vesicles compared to their longer RNA counterparts.
Okay. Thank you. Working with urine is very challenging since the sample can be really concentrated or diluted. So how do you normalize the data? Do you have any suggestions?
Yes, I think data normalization really comes at the end. I would like to suggest to work on the standardization of the collection first. We don't have an answer on what's the best way, but I think the key is standardization. For instance, you take morning urine and mid-stream. That could be an example. And I think it's important to standardize that. So make sure that, for instance, the concentration of the different metabolites and salts are very similar. Mid-stream, morning urine. For instance, there are very good collection devices to standardize that. Once you've tried to standardize how you collect the urine, the second key part is to do a quick spin to remove the cells. Of course, the cells are also useful, but if you only want to look at extracellular RNA, you do a quick spin down. You take the supernatant, and then, as I explained already, we add spikes during extraction and during library prep. And then you really have different options for normalization. I really use the spikes, which allow you to normalize based on urine volume if you use the spikes used during RNA extraction. Or you could normalize based on RNA volume, RNA eluate volume if you use the spikes present in the RNA eluate. Or you could normalize using the endogenous RNAs. And to be honest, we have different scenarios where different normalization strategies are better. So how can you assess if a normalization strategy works? Well, one is the data makes more sense. You have, for instance, more differentially expressed genes or more differential pathways that make sense if you take a look at the genes or the pathways through literature, or you see larger-fold changes, or you have more significant p-values or a combination of these three. And these parameters allow you to choose a normalization strategy. And there is no good and bad. There is no right and wrong. I think as long as you clearly state in your scientific report how you normalize your data and what the results are, I think any method is fine.
Thank you. A question on the case two with EVs. Is there already a publication with methods and results regarding this case two?
Exactly. This is published in Scientific Reports, 2019, and the first author is Everaert. I will quickly go back to that. I would think it's the very first slide. Yes, it's this one, the top one, Scientific Reports, 2019, in which we present the Pico v2 method to study fluids and derived EVs from conditioned medium, plasma, and urine.
Thank you. How did you extract RNA from FFPE samples?
We have different methods, but in this particular case, it was Qiagen's method for FFPE RNA extraction. We have also good experience with the Promega method, and I'm sure there are many others out there. I think extracting RNA from FFPE is nowadays not the biggest challenge, and sequencing definitely not anymore as well.
How many plasma samples should be used to get reliable data if searching for specific circulating long non-coding RNA by RNA sick inpatients controlled type of study?
I'm again going to provide a bold answer: more. Really, it's an excellent question, but it's difficult to answer. We always recommend more. And I know because of sample availability and budget constraints, many researchers use sample cohorts that are way too small. And I think this is probably underlying or is part of the reproducibility crisis in the scientific literature, that the cohorts are too small, that results are anecdotal or coincidental, and that people cannot confirm. So a strong recommendation to include as many as possible. I obviously understand this is not very helpful if you're new into the field. That's why we would say a good starting point is 24 patients per group if you do group-based analysis, let's say, treated versus untreated. If you have matched samples from the same patients or donors where you can do some longitudinal monitoring, it seems that you can get away with fewer patients because the intrinsic variability or the inter-individual donor variability is compensated by looking at differences over time in the same donor. Twenty-four in at least two different groups, perhaps 12 samples, patients/donors, is a good pilot. I recommend to do this pilot, learn from it, and then do really proper power analysis to determine how many samples you need. In the question, there was a reference to long non-coding RNA. We all know that long non-coding RNAs are typically low abundant. I would say typically because we know that MALAT1 and NEAT1, for instance, are extremely abundant in human plasma. So they really eat up the majority of the reads. So many of the long non-coding RNAs are actually quite low abundant. So what you do with low-abundant, noisy data, there are two ways to solve that. It's more data points, but also try to increase your sample input volume. Instead of using, for instance, 200 microliters of plasma, work with a kit that can handle 4 to 5 ml of plasma input. You will have a much better signal, many more reads to these low-abundant, long non-coding RNAs and much better data. Hence, better statistics.
Thank you. Next question, and maybe that will be the last one. There are others, and we will provide you with the answers and send you via email. So let's take a very last question. Is there a need for doing RNase digestion to exclude cell-free RNA and just end up with EV microRNAs?
Yes, this is the recommendation. Once you want to study vesicles and you only want to, or you want to make sure that the RNA resulting from an EV purification is RNA containing inside a vesicle, it's strongly recommended, according to the ISOP guidelines, to treat your vesicles with an RNase and sometimes also a protease. By doing a combination, alternating proteases and RNases, you can have a very good understanding of whether the RNA is inside the vesicles or bound to vesicles.