文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

School of Biological Sciences, Seoul National University, Seoul, 08826, South Korea.

Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, 21218, Maryland, USA.

Genome Biol. 2020 May 12;21(1):115. doi: 10.1186/s13059-020-02023-1.

Genomic analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here, we describe Conterminator, an efficient method to detect and remove incorrectly labeled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination of 2,161,746, 114,035, and 14,148 sequences in the RefSeq, GenBank, and NR databases, respectively, spanning the whole range from draft to "complete" model organism genomes. Our method scales linearly with input size and can process 3.3 TB in 12 days on a 32-core computer. Conterminator can help ensure the quality of reference databases. Source code (GPLv3): https://github.com/martin-steinegger/conterminator.

基因组分析对公共数据库中因参考序列标签错误而导致的污染非常敏感。在这里，我们描述了 Conterminator，这是一种通过全面的序列两两比较来检测和去除错误标记序列的有效方法。我们的分析报告称，RefSeq、GenBank 和 NR 数据库中分别有 2161746、114035 和 14148 个序列受到污染，涵盖了从草案到“完整”模式生物基因组的整个范围。我们的方法与输入大小呈线性比例关系，在 32 核计算机上每天可处理 3.3 TB。Conterminator 可以帮助确保参考数据库的质量。源代码（GPLv3）：https://github.com/martin-steinegger/conterminator。

School of Biological Sciences, Seoul National University, Seoul, 08826, South Korea.

Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, 21218, Maryland, USA.

Genome Biol. 2020 May 12;21(1):115. doi: 10.1186/s13059-020-02023-1.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

终止污染：大规模搜索在 GenBank 中发现超过 200 万条污染条目。

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

终止污染：大规模搜索在 GenBank 中发现超过 200 万条污染条目。

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献