Altius Institute for Biomedical Sciences, 2211 Elliott Ave, Seattle, WA 98121, United States.
Stem Cell Program, Boston Children's Hospital, 300 Longwood Avenue, Boston, MA 02115, United States.
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i431-i439. doi: 10.1093/bioinformatics/btad254.
Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach which is highly accurate at only a small fraction of the cost.
We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and Caenorhabditis elegans. Our new approach, controlFreq, enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of ∼5%.
Analysis pipeline for this approach is available at GitHub as R package controlFreq (github.com/gimelbrantlab/controlFreq).
RNA-seq 实验中存在的技术噪声强烈影响等位基因特异性表达的分析。此前,我们表明可以使用技术重复来精确估计这种噪声,并提供了一种用于校正等位基因特异性表达分析中技术噪声的工具。这种方法非常准确,但由于需要对每个文库进行两次或更多次重复,因此成本很高。在这里,我们开发了一种只需花费一小部分成本但高度准确的 Spike-in 方法。
我们表明,在文库制备之前添加的一种独特的 RNA 作为 Spike-in 反映了整个文库的技术噪声,并可以在大量样本中批量使用。我们使用可通过比对区分的物种(即小鼠、人类和秀丽隐杆线虫)的 RNA 组合实验证明了这种方法的有效性。我们的新方法 controlFreq 能够以约 5%的总成本增加实现(和之间)任意大规模研究中等位基因特异性表达的高度准确和计算高效分析。
此方法的分析管道可在 GitHub 上作为 R 包 controlFreq(github.com/gimelbrantlab/controlFreq)获得。