重采样核苷酸序列并进行最近邻修剪及其与其他方法的比较。

Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

机构信息

Department of Computer Bioscience, Nagahama Institute of Bio-science and Technology, Nagahama, Shiga-pref, Japan.

出版信息

PLoS One. 2013;8(2):e57684. doi: 10.1371/journal.pone.0057684. Epub 2013 Feb 27.

DOI:10.1371/journal.pone.0057684

PMID:23460894

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3583903/

Abstract

A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm-called the closest-neighbor trimming method-that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and [Formula: see text]-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.

摘要

大量的各种病原体的核苷酸序列可在公共数据库中获得。数据集的增长导致计算成本的大量增加。此外，由于监测活动的差异，数据库中发现的序列数量因国家和年份而异。因此，研究重采样方法以减少采样偏差是很重要的。提出了一种新的算法，称为最近邻修剪方法，该方法可以从大型核苷酸序列数据集中随机抽取给定数量的序列。通过使用人 H3N2 流感病毒的核苷酸序列，将所提出的算法的性能与其他算法进行了比较。我们将最近邻修剪方法与朴素层次聚类算法和[Formula: see text]-medoids 聚类算法进行了比较。公共数据库中积累的遗传信息包含采样偏差。最近邻修剪方法可以从给定的数据集中剔除密集采样的序列。由于核苷酸序列是生命科学中最广泛使用的材料之一，我们预计我们的算法将应用于各种数据集，从而减少采样偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62e1/3583903/01d14dd3d015/pone.0057684.g001.jpg

相似文献

Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

PLoS One. 2013;8(2):e57684. doi: 10.1371/journal.pone.0057684. Epub 2013 Feb 27.

[Virological characteristics of influenza A (H3N2) virus in mainland China during 2013-2014].

Bing Du Xue Bao. 2015 Jan;31(1):30-5.

Genetic characterization of influenza viruses from influenza-related hospital admissions in the St. Petersburg and Valencia sites of the Global Influenza Hospital Surveillance Network during the 2013/14 influenza season.

J Clin Virol. 2016 Nov;84:32-38. doi: 10.1016/j.jcv.2016.09.006. Epub 2016 Sep 28.

Genetic characterization of hemagglutinin (HA) gene of influenza A viruses circulating in Southwest India during 2017 season.

Virus Genes. 2019 Aug;55(4):458-464. doi: 10.1007/s11262-019-01675-x. Epub 2019 May 25.

Triplet entropy analysis of hemagglutinin and neuraminidase sequences measures influenza virus phylodynamics.

Gene. 2013 Oct 10;528(2):277-81. doi: 10.1016/j.gene.2013.06.060. Epub 2013 Jul 11.

[Study on the correlation of human influenza A/H3N2 hemagglutinin gene variation and the epidemic from 1995 to 2005 in China].

Bing Du Xue Bao. 2007 Sep;23(5):339-44.

Genetic changes in influenza A(H3N2) viruses circulating during 2011 to 2013 in northern India (Lucknow).

J Med Virol. 2015 Aug;87(8):1268-75. doi: 10.1002/jmv.24096. Epub 2015 Apr 24.

Identifying errors in avian influenza virus gene sequences and implications for data usage of public databases.

Genomics. 2010 Jan;95(1):29-36. doi: 10.1016/j.ygeno.2009.09.005. Epub 2009 Sep 18.

Genetic diversity of influenza A(H3N2) viruses in Northern Cameroon during the 2014-2016 influenza seasons.

J Med Virol. 2019 Aug;91(8):1400-1407. doi: 10.1002/jmv.25456. Epub 2019 Mar 25.

Influenza A virus in Taiwan, 1980-2006: Phylogenetic and antigenic characteristics of the hemagglutinin gene.

J Med Virol. 2009 Aug;81(8):1457-70. doi: 10.1002/jmv.21531.

引用本文的文献

Coevolutionary Analysis Identifies Protein-Protein Interaction Sites between HIV-1 Reverse Transcriptase and Integrase.

Virus Evol. 2016 Jan;2(1). doi: 10.1093/ve/vew002. Epub 2016 Feb 23.

Efficient isolation of Swine influenza viruses by age-targeted specimen collection.

J Clin Microbiol. 2015 Apr;53(4):1331-8. doi: 10.1128/JCM.02941-14. Epub 2015 Feb 18.

本文引用的文献

Gnarled-trunk evolutionary model of influenza A virus hemagglutinin.

PLoS One. 2011;6(10):e25953. doi: 10.1371/journal.pone.0025953. Epub 2011 Oct 10.

Visualization of large influenza virus sequence datasets using adaptively aggregated trees with sampling-based subscale representation.

BMC Bioinformatics. 2008 May 16;9:237. doi: 10.1186/1471-2105-9-237.

Dendroscope: An interactive viewer for large phylogenetic trees.

BMC Bioinformatics. 2007 Nov 22;8:460. doi: 10.1186/1471-2105-8-460.

The influenza virus resource at the National Center for Biotechnology Information.

J Virol. 2008 Jan;82(2):596-601. doi: 10.1128/JVI.02005-07. Epub 2007 Oct 17.

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

Nature. 2005 Oct 20;437(7062):1162-6. doi: 10.1038/nature04239. Epub 2005 Oct 5.

Ecological and immunological determinants of influenza evolution.

Nature. 2003 Mar 27;422(6930):428-33. doi: 10.1038/nature01509.

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Mol Biol Evol. 1987 Jul;4(4):406-25. doi: 10.1093/oxfordjournals.molbev.a040454.

Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase.

Science. 1988 Jan 29;239(4839):487-91. doi: 10.1126/science.2448875.

Basic local alignment search tool.

J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

重采样核苷酸序列并进行最近邻修剪及其与其他方法的比较。

Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

机构信息

Department of Computer Bioscience, Nagahama Institute of Bio-science and Technology, Nagahama, Shiga-pref, Japan.

出版信息

PLoS One. 2013;8(2):e57684. doi: 10.1371/journal.pone.0057684. Epub 2013 Feb 27.

DOI:10.1371/journal.pone.0057684

PMID:23460894

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3583903/

Abstract

摘要

重采样核苷酸序列并进行最近邻修剪及其与其他方法的比较。

Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

重采样核苷酸序列并进行最近邻修剪及其与其他方法的比较。

Resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献