Suppr超能文献

鉴定和纠正端粒纳米孔测序中的重复调用错误。

Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres.

机构信息

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.

Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

出版信息

Genome Biol. 2022 Aug 26;23(1):180. doi: 10.1186/s13059-022-02751-6.

Abstract

Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions. We highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions, and showcase how artefacts can be resolved by improvements in nanopore basecalling models.

摘要

纳米孔长读测序是一种新兴的研究基因组的方法,包括端粒等长重复元件。在这里,我们报告了在纳米孔数据集、测序平台、碱基调用器和碱基调用模型中,端粒重复序列的广泛碱基调用诱导错误。我们发现许多生物的端粒经常被误报。我们证明,纳米孔碱基调用模型的调整可以改善端粒区域的恢复和分析,而对其他基因组区域的负面影响最小。我们强调了在长、重复和定义不明确的区域中验证纳米孔碱基调用的重要性,并展示了如何通过改进纳米孔碱基调用模型来解决伪影问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c0d/9414165/2dcb753a7fc6/13059_2022_2751_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验