Suppr超能文献

下一代测序数据计算纠错方法的基准测试。

Benchmarking of computational error-correction methods for next-generation sequencing data.

机构信息

Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.

Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.

出版信息

Genome Biol. 2020 Mar 17;21(1):71. doi: 10.1186/s13059-020-01988-3.

Abstract

BACKGROUND

Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown.

RESULTS

In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods.

CONCLUSIONS

In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

摘要

背景

新一代测序技术的最新进展迅速提高了我们以空前规模研究基因组材料的能力。尽管测序技术有了实质性的改进,但数据中的错误仍然存在,这可能会混淆下游分析,并限制测序技术在临床工具中的适用性。计算纠错有望消除测序错误,但纠错算法的相对准确性仍不清楚。

结果

在本文中,我们评估了错误纠正算法在包含不同程度异质性的不同类型数据集上纠正错误的能力。我们强调了计算错误纠正技术在免疫基因组学和病毒学等不同生物学领域的优势和局限性。为了展示我们技术的效果,我们应用基于 UMI 的高保真度测序方案从模拟数据和原始读数中消除测序错误。然后,我们对错误纠正方法进行了实际评估。

结论

在准确性方面,我们发现方法性能在不同类型的数据集之间存在很大差异,没有一种方法在所有类型的检查数据上都表现最好。最后,我们还确定了在精度和灵敏度之间提供良好平衡的技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a13/7079412/bcc2916505f4/13059_2020_1988_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验