Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shiShiga-ken, 526-0829, Japan.
BMC Infect Dis. 2013 Aug 21;13:386. doi: 10.1186/1471-2334-13-386.
With the remarkable increase of microbial and viral sequence data obtained from high-throughput DNA sequencers, novel tools are needed for comprehensive analysis of the big sequence data. We have developed "Batch-Learning Self-Organizing Map (BLSOM)" which can characterize very many, even millions of, genomic sequences on one plane. Influenza virus is one of zoonotic viruses and shows clear host tropism. Important issues for bioinformatics studies of influenza viruses are prediction of genomic sequence changes in the near future and surveillance of potentially hazardous strains.
To characterize sequence changes in influenza virus genomes after invasion into humans from other animal hosts, we applied BLSOMs to analyses of mono-, di-, tri-, and tetranucleotide compositions in all genome sequences of influenza A and B viruses and found clear host-dependent clustering (self-organization) of the sequences.
Viruses isolated from humans and birds differed in mononucleotide composition from each other. In addition, host-dependent oligonucleotide compositions that could not be explained with the host-dependent mononucleotide composition were revealed by oligonucleotide BLSOMs. Retrospective time-dependent directional changes of mono- and oligonucleotide compositions, which were visualized for human strains on BLSOMs, could provide predictive information about sequence changes in newly invaded viruses from other animal hosts (e.g. the swine-derived pandemic H1N1/09).
Basing on the host-dependent oligonucleotide composition, we proposed a strategy for prediction of directional changes of virus sequences and for surveillance of potentially hazardous strains when introduced into human populations from non-human sources. Millions of genomic sequences from infectious microbes and viruses have become available because of their medical and social importance, and BLSOM can characterize the big data and support efficient knowledge discovery.
随着高通量 DNA 测序仪获得的微生物和病毒序列数据的显著增加,需要新的工具来对这些大量的序列数据进行全面分析。我们开发了“批量学习自组织映射(BLSOM)”,它可以在一个平面上对非常多甚至数百万个基因组序列进行特征化。流感病毒是一种人畜共患病毒,具有明显的宿主嗜性。流感病毒的生物信息学研究的重要问题是预测近期基因组序列的变化和监测潜在危险的菌株。
为了描述流感病毒基因组在从其他动物宿主侵入人体后的序列变化,我们将 BLSOM 应用于分析所有 A 型和 B 型流感病毒的基因组中单、二、三、四核苷酸组成,发现序列存在明显的宿主依赖性聚类(自组织)。
从人类和鸟类分离出的病毒在单核苷酸组成上彼此不同。此外,通过寡核苷酸 BLSOM 揭示了不能用宿主依赖性单核苷酸组成解释的宿主依赖性寡核苷酸组成。在 BLSOM 上对人株进行的单核苷酸和寡核苷酸组成的时间依赖性回溯方向变化,可以提供有关新侵入的来自其他动物宿主的病毒序列变化的预测信息(例如,源自猪的大流行 H1N1/09)。
基于宿主依赖性寡核苷酸组成,我们提出了一种预测病毒序列方向变化的策略,并提出了从非人类来源引入人群时监测潜在危险菌株的策略。由于其医学和社会重要性,数以百万计的传染病微生物和病毒的基因组序列已经可用,BLSOM 可以对大数据进行特征化并支持有效的知识发现。