Suppr超能文献

混合模型揭示了RNA测序数据中的多种位置偏差类型,并能准确估计转录本浓度。

Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.

作者信息

Tuerk Andreas, Wiktorin Gregor, Güler Serhat

机构信息

Lexogen GmbH, Vienna, Austria.

出版信息

PLoS Comput Biol. 2017 May 15;13(5):e1005515. doi: 10.1371/journal.pcbi.1005515. eCollection 2017 May.

Abstract

Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix2 (rd. "mixquare"), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix2 are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix2 to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix2 overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix2 on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix2 achieves improved correlation to qPCR measurements with a relative increase in R2 between 4% and 50%. Mix2 also yields repeatable concentration estimates across technical replicates with a relative increase in R2 between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix2 reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix2 yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix2, 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix2, 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R2 between 8% and 44% and reduced standard deviation.

摘要

RNA测序中,转录本定量的准确性会受到片段位置偏差的负面影响。本文介绍了Mix2(读作“mixquare”),这是一种转录本定量方法,它使用概率分布的混合来建模,从而抵消片段位置偏差的影响。Mix2的参数通过期望最大化进行训练,从而同时估计转录本丰度和偏差。我们将Mix2与Cufflinks、RSEM、eXpress和PennSeq进行比较;这些都是实施某种形式偏差校正的最先进的定量方法。在四种合成偏差上,我们表明Mix2的准确性总体上超过了其他方法,并且其偏差估计收敛到正确的解决方案。我们进一步在来自微阵列和测序质量控制(MAQC,SEQC)联盟的真实RNA测序数据上评估Mix2。在MAQC数据上,Mix2与qPCR测量的相关性得到改善,R2相对增加4%至50%。Mix2在技术重复中也产生了可重复的浓度估计,R2相对增加8%至47%,并且在整个浓度范围内标准偏差降低。我们进一步观察到差异表达的检测更准确,对于5%的假阳性,真阳性相对增加74%至378%。此外,Mix2揭示了MAQC数据中5种主要偏差,这些偏差偏离了片段分布均匀的常见假设。在SEQC数据上,Mix2在测量浓度比和预测浓度比之间产生了更高的一致性。Mix2对51%的转录本获得了20%或更低的相对误差,Cufflinks和RSEM为40%的转录本,eXpress为30%。对于Mix2,47%的转录本滴定顺序一致性正确,Cufflinks和RSEM为41%,eXpress为34%。我们还观察到跨实验室站点的重复性得到改善,R2相对增加8%至44%,标准偏差降低。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验