Wan Xiu-Feng, Chen Guorong, Luo Feng, Emch Michael, Donis Ruben
Department of Microbiology, Miami University, Oxford, OH 45056, USA.
Bioinformatics. 2007 Sep 15;23(18):2368-75. doi: 10.1093/bioinformatics/btm354. Epub 2007 Jul 10.
Computational genotyping analyses are critical for characterizing molecular evolutionary footprints, thus providing important information for designing the strategies of influenza prevention and control. Most of the current methods that are available are based on multiple sequence alignment and phylogenetic tree construction, which are time consuming and limited by the number of taxa. Arbitrarily defining genotypes further complicates the interpretation of genotyping results.
In this study, we describe a quantitative influenza genotyping algorithm based on the theory of quasispecies. First, the complete composition vector (CCV) was utilized to calculate the pairwise evolutionary distance between genotypes. Next, Hierarchical Bayesian Modeling using the Gibbs Sampling algorithm was applied to identify the segment genotype threshold, which is used to identify influenza segment genotype through a modularity calculation. The viral genotype was defined by combining eight segment genotypes based on the genetic reassortment feature of influenza A viruses.
We applied this method for H5N1 avian influenza viruses and identified 107 niches among 283 viruses with a complete genome set. The diversity of viral genotypes, and their correlation with geographic locations suggests that these viruses form local niches after being introduced to a new ecological environment through poultry trade or bird migration. This novel method allows us to define genotypes in a robust, quantitative as well as hierarchical manner.
Supplementary data are available at Bioinformatics online.
计算基因分型分析对于表征分子进化足迹至关重要,从而为设计流感预防和控制策略提供重要信息。当前大多数可用方法基于多序列比对和系统发育树构建,这些方法耗时且受分类单元数量限制。任意定义基因型会使基因分型结果的解释更加复杂。
在本研究中,我们描述了一种基于准种理论的定量流感基因分型算法。首先,利用完整组成向量(CCV)计算基因型之间的成对进化距离。接下来,应用使用吉布斯采样算法的分层贝叶斯建模来确定片段基因型阈值,该阈值用于通过模块化计算识别流感片段基因型。基于甲型流感病毒的基因重配特征,通过组合八个片段基因型来定义病毒基因型。
我们将此方法应用于H5N1禽流感病毒,并在283个具有完整基因组集的病毒中识别出107个生态位。病毒基因型的多样性及其与地理位置的相关性表明,这些病毒通过家禽贸易或鸟类迁徙被引入新的生态环境后形成了局部生态位。这种新方法使我们能够以稳健、定量和分层的方式定义基因型。
补充数据可在《生物信息学》在线获取。