Suppr超能文献

从头转录组组装中的错误、噪声和偏差。

Error, noise and bias in de novo transcriptome assemblies.

机构信息

Faculty of Arts and Sciences Informatics Group, Harvard University, Cambridge, MA, USA.

出版信息

Mol Ecol Resour. 2021 Jan;21(1):18-29. doi: 10.1111/1755-0998.13156. Epub 2020 Apr 13.

Abstract

De novo transcriptome assembly is a powerful tool, and has been widely used over the last decade for making evolutionary inferences. However, it relies on two implicit assumptions: that the assembled transcriptome is an unbiased representation of the underlying expressed transcriptome, and that expression estimates from the assembly are good, if noisy approximations of the relative abundance of expressed transcripts. Using publicly available data for model organisms, we demonstrate that, across assembly algorithms and data sets, these assumptions are consistently violated. Bias exists at the nucleotide level, with genotyping error rates ranging from 30% to 83%. As a result, diversity is underestimated in transcriptome assemblies, with consistent underestimation of heterozygosity in all but the most inbred samples. Even at the gene level, expression estimates show wide deviations from map-to-reference estimates, and positive bias at lower expression levels. Standard filtering of transcriptome assemblies improves the robustness of gene expression estimates but leads to the loss of a meaningful number of protein-coding genes, including many that are highly expressed. We demonstrate a computational method, length-rescaled CPM, to partly alleviate noise and bias in expression estimates. Researchers should consider ways to minimize the impact of bias in transcriptome assemblies.

摘要

从头转录组组装是一种强大的工具,在过去十年中被广泛用于进行进化推断。然而,它依赖于两个隐含的假设:组装的转录组是潜在表达转录组的无偏表示,并且来自组装的表达估计是表达转录物相对丰度的良好(如果有噪音)近似值。我们使用公开可用的数据来证明,在不同的组装算法和数据集下,这些假设始终被违反。在核苷酸水平上存在偏差,基因分型错误率范围为 30%至 83%。因此,转录组组装中的多样性被低估,除了最近交的样本外,所有样本的杂合性都被一致低估。即使在基因水平上,表达估计值也与图谱到参考的估计值存在很大偏差,并且在较低的表达水平上存在正偏差。对转录组组装进行标准过滤可以提高基因表达估计的稳健性,但会导致大量有意义的编码蛋白基因的丢失,包括许多高度表达的基因。我们展示了一种计算方法,长度缩放 CPM,以部分减轻表达估计中的噪声和偏差。研究人员应该考虑如何最小化转录组组装中的偏差影响。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验