Zhang Zehua, Guo Kecheng, Pan Gaofeng, Tang Jijun, Guo Fei
School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, People's Republic of China.
Department of Computer Science and Engineering, University of South Carolina, Columbia, USA.
BMC Syst Biol. 2017 Sep 21;11(Suppl 4):79. doi: 10.1186/s12918-017-0453-x.
Phylogenetic analysis is a key way to understand current research in the biological processes and detect theory in evolution of natural selection. The evolutionary relationship between species is generally reflected in the form of phylogenetic trees. Many methods for constructing phylogenetic trees, are based on the optimization criteria. We extract the biological data via modeling features, and then compare these characteristics to study the biological evolution between species.
Here, we use maximum likelihood and Bayesian inference method to establish phylogenetic trees; multi-chain Markov chain Monte Carlo sampling method can be used to select optimal phylogenetic tree, resolving local optimum problem. The correlation model of phylogenetic analysis assumes that phylogenetic trees are built on homogeneous data, however there exists a large deviation in the presence of heterogeneous data. We use conscious detection to solve compositional heterogeneity. Our method is evaluated on two sets of experimental data, a group of bacterial 16S ribosomal RNA gene data, and a group of genetic data with five homologous species.
Our method can obtain accurate phylogenetic trees on the homologous data, and also detect the compositional heterogeneity of experimental data. We provide an efficient method to enhance the accuracy of generated phylogenetic tree.
系统发育分析是理解当前生物过程研究和检测自然选择进化理论的关键途径。物种之间的进化关系通常以系统发育树的形式体现。许多构建系统发育树的方法都基于优化标准。我们通过建模特征提取生物数据,然后比较这些特征来研究物种间的生物进化。
在此,我们使用最大似然法和贝叶斯推理方法来建立系统发育树;多链马尔可夫链蒙特卡罗采样方法可用于选择最优系统发育树,解决局部最优问题。系统发育分析的相关模型假设系统发育树是基于同质数据构建的,然而在存在异质数据的情况下会存在较大偏差。我们使用有意识检测来解决成分异质性问题。我们的方法在两组实验数据上进行了评估,一组是细菌16S核糖体RNA基因数据,另一组是具有五个同源物种的遗传数据。
我们的方法能够在同源数据上获得准确的系统发育树,同时也能检测实验数据的成分异质性。我们提供了一种提高生成系统发育树准确性的有效方法。