Suppr超能文献

人乳头瘤病毒数据库中的错误分类

Misclassifications in human papillomavirus databases.

作者信息

Arroyo Mühr Laila Sara, Eklund Carina, Dillner Joakim

机构信息

International HPV Reference Center, Department of Laboratory Medicine, Karolinska Institutet, SE-141 86 Stockholm, Sweden.

International HPV Reference Center, Department of Laboratory Medicine, Karolinska Institutet, SE-141 86 Stockholm, Sweden.

出版信息

Virology. 2021 Jun;558:57-66. doi: 10.1016/j.virol.2021.03.002. Epub 2021 Mar 11.

Abstract

We assessed the quality of human papillomavirus (HPV) sequences in GenBank by analyzing the possible presence of chimeras, "wrong-assembled" contigs and errors in taxonomy using an open-source script (HPVChimera_Gb) that compared 25 638 HPV-related nucleotide sequences in GenBank with the 221 numbered HPV types and another 220 complete HPV sequences. There were 110 sequences with taxonomy/naming errors (sequences reported as another HPV type than the one they corresponded to) and 1318 possibly chimeric sequences. Manual analysis found plausible explanations for most of them (e.g. sequence covering an integration site) but 114 sequences appeared to be chimeras (96/114 were already flagged as "unverified" by GenBank) and 13 had taxonomy/naming errors. When comparing all correct HPV sequences in GenBank, there appeared to exist about 800 unique putative HPV types. Systematic and regular work towards eliminating chimeric sequences and taxonomy/naming errors could increase the quality and order in HPV research.

摘要

我们通过使用一个开源脚本(HPVChimera_Gb)来分析基因库中人类乳头瘤病毒(HPV)序列的质量,该脚本用于比较基因库中25638条与HPV相关的核苷酸序列和221种已编号的HPV类型以及另外220条完整的HPV序列,以检测嵌合体、“错误组装”的重叠群以及分类学错误的可能存在情况。结果发现有110条序列存在分类学/命名错误(报告的序列所属的HPV类型与实际对应的类型不符)以及1318条可能为嵌合的序列。人工分析为其中大多数找到了合理的解释(例如覆盖整合位点的序列),但仍有114条序列似乎是嵌合体(其中96/114条已被基因库标记为“未经验证”),还有13条存在分类学/命名错误。在比较基因库中所有正确的HPV序列时,似乎存在约800种独特的假定HPV类型。系统且定期地致力于消除嵌合序列以及分类学/命名错误,可能会提高HPV研究的质量和条理性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验