A型流感数据库中的测序错误及纠错尝试。

Sequencing artifacts in the type A influenza databases and attempts to correct them.

机构信息

Exotic and Emerging Avian Viral Disease Research Unit, Southeast Poultry Research Laboratory, Agricultural Research Service, USDA, Athens, GA, USA.

出版信息

Influenza Other Respir Viruses. 2014 Jul;8(4):499-505. doi: 10.1111/irv.12239. Epub 2014 Feb 7.

DOI:10.1111/irv.12239

PMID:24512607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4181811/

Abstract

BACKGROUND

There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor.

OBJECTIVE

As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate.

METHODS

Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments.

RESULTS AND CONCLUSIONS

A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets.

摘要

背景

公共数据库中已有超过 276000 个流感基因序列，其质量由贡献者决定。

目的

作为高中班级项目的一部分，根据基因长度长于预期这一特征，在公共数据库中识别出可能存在错误的流感序列，假设这些序列存在错误。学生联系序列提交者，提醒他们可能存在序列问题，并要求他们在适当的情况下纠正可疑序列。

方法

筛选 A 型流感病毒，并进一步分析基因片段大于可接受大小的序列。重点关注那些在病毒片段高度保守的非编码端上下游有额外核苷酸的序列。

结果与结论

共确定了 1081 条符合这一标准的序列。常见的错误类型有：非流感引物序列未从序列中去除；PCR 产物被克隆，质粒序列包含在序列中；Taq 聚合酶在 PCR 产物的末端添加了一个腺嘌呤。还经常观察到核苷酸序列的内部插入，但在许多情况下，不清楚序列是否正确，或者实际上是否存在错误。在学生项目的第一年，共有 215 个序列（占可疑序列的 22.8%）在公共数据库中得到纠正。不幸的是，第二年又有 138 个可能存在错误的序列添加到数据库中。需要进一步提高对提交到公共数据库的序列数据完整性的认识，才能充分利用这些大型数据集。

相似文献

Sequencing artifacts in the type A influenza databases and attempts to correct them.A型流感数据库中的测序错误及纠错尝试。

Influenza Other Respir Viruses. 2014 Jul;8(4):499-505. doi: 10.1111/irv.12239. Epub 2014 Feb 7.

Identifying errors in avian influenza virus gene sequences and implications for data usage of public databases.鉴定禽流感病毒基因序列中的错误及其对公共数据库数据使用的影响。

Genomics. 2010 Jan;95(1):29-36. doi: 10.1016/j.ygeno.2009.09.005. Epub 2009 Sep 18.

Robust sequence selection method used to develop the FluChip diagnostic microarray for influenza virus.用于开发流感病毒FluChip诊断微阵列的稳健序列选择方法。

J Clin Microbiol. 2006 Aug;44(8):2857-62. doi: 10.1128/JCM.00135-06.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Bioinformatics studies of Influenza A hemagglutinin sequence data indicate recombination-like events leading to segment exchanges.甲型流感血凝素序列数据的生物信息学研究表明，存在导致片段交换的类似重组事件。

BMC Res Notes. 2016 Apr 15;9:222. doi: 10.1186/s13104-016-2017-3.

Method for Sequencing the Whole Genome of the Highly Pathogenic Avian Influenza A(H5N6) Virus.高致病性甲型禽流感病毒（H5N6）全基因组测序方法

Bing Du Xue Bao. 2017 Jan;33(1):19-23.

Systematic phylogenetic analysis of influenza A virus reveals many novel mosaic genome segments.系统进化分析表明甲型流感病毒存在大量新颖的基因重组片段。

Infect Genet Evol. 2013 Aug;18:367-78. doi: 10.1016/j.meegid.2013.03.015. Epub 2013 Mar 30.

A rapid method for the analysis of influenza virus genes: application to the reassortment of equine influenza virus genes.一种分析流感病毒基因的快速方法：应用于马流感病毒基因的重配

Virus Res. 1994 Jun;32(3):391-9. doi: 10.1016/0168-1702(94)90087-6.

Detection of resistance mutations to antivirals oseltamivir and zanamivir in avian influenza A viruses isolated from wild birds.从野生鸟类中分离出的甲型流感病毒对奥司他韦和扎那米韦抗病毒药物的耐药突变检测。

PLoS One. 2011 Jan 6;6(1):e16028. doi: 10.1371/journal.pone.0016028.

A comprehensive deep sequencing strategy for full-length genomes of influenza A.一种用于流感 A 全长基因组的全面深度测序策略。

PLoS One. 2011 Apr 29;6(4):e19075. doi: 10.1371/journal.pone.0019075.

引用本文的文献

A universal RT-qPCR assay for "One Health" detection of influenza A viruses.一种用于“同一健康”检测流感 A 病毒的通用 RT-qPCR 检测方法。

PLoS One. 2021 Jan 20;16(1):e0244669. doi: 10.1371/journal.pone.0244669. eCollection 2021.

In silico re-assessment of a diagnostic RT-qPCR assay for universal detection of Influenza A viruses.基于计算机的对用于通用检测流感 A 病毒的诊断 RT-qPCR 检测方法的再评估。

Sci Rep. 2019 Feb 7;9(1):1630. doi: 10.1038/s41598-018-37869-w.

本文引用的文献

GenBank.GenBank。

Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. doi: 10.1093/nar/gks1195. Epub 2012 Nov 27.

A distinct lineage of influenza A virus from bats.一种源自蝙蝠的流感 A 病毒的独特谱系。

Proc Natl Acad Sci U S A. 2012 Mar 13;109(11):4269-74. doi: 10.1073/pnas.1116200109. Epub 2012 Feb 27.

Influenza research database: an integrated bioinformatics resource for influenza research and surveillance.流感研究数据库：流感研究和监测的综合生物信息学资源。

Influenza Other Respir Viruses. 2012 Nov;6(6):404-16. doi: 10.1111/j.1750-2659.2011.00331.x. Epub 2012 Jan 20.

Sequencing and mutational analysis of the non-coding regions of influenza A virus.甲型流感病毒非编码区的测序与突变分析

Vet Microbiol. 2009 Mar 30;135(3-4):239-47. doi: 10.1016/j.vetmic.2008.09.067. Epub 2008 Sep 24.

FLAN: a web server for influenza virus genome annotation.FLAN：一个用于流感病毒基因组注释的网络服务器。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W280-4. doi: 10.1093/nar/gkm354. Epub 2007 Jun 1.

Rapid sequencing of the non-coding regions of influenza A virus.甲型流感病毒非编码区的快速测序

J Virol Methods. 2007 Jan;139(1):85-9. doi: 10.1016/j.jviromet.2006.09.015. Epub 2006 Oct 23.

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.对人类流感病毒的大规模测序揭示了病毒基因组进化的动态本质。

Nature. 2005 Oct 20;437(7062):1162-6. doi: 10.1038/nature04239. Epub 2005 Oct 5.

Rapid method for the characterization of 3' and 5' UTRs of influenza viruses.流感病毒3'和5'非翻译区特征分析的快速方法

J Virol Methods. 2003 Jan;107(1):15-20. doi: 10.1016/s0166-0934(02)00184-2.

Universal primer set for the full-length amplification of all influenza A viruses.用于甲型流感病毒全长扩增的通用引物组。

Arch Virol. 2001 Dec;146(12):2275-89. doi: 10.1007/s007050170002.

Promoter elements in the influenza vRNA terminal structure.流感病毒vRNA末端结构中的启动子元件。

RNA. 1996 Oct;2(10):1046-57.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验