Suppr超能文献

用于多序列数据联合分析的最大似然模型

Maximum-Likelihood Models for Combined Analyses of Multiple Sequence Data.

作者信息

Yang Z

机构信息

Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, 328 Mueller Laboratory, University Park, PA 16802, USA

出版信息

J Mol Evol. 1996 May;42(5):587-96. doi: 10.1007/BF02352289.

Abstract

Abstract. Models of nucleotide substitution were constructed for combined analyses of heterogeneous sequence data (such as those of multiple genes) from the same set of species. The models account for different aspects of the heterogeneity in the evolutionary process of different genes, such as differences in nucleotide frequencies, in substitution rate bias (for example, the transition/transversion rate bias), and in the extent of rate variation across sites. Model parameters were estimated by maximum likelihood and the likelihood ratio test was used to test hypotheses concerning sequence evolution, such as rate constancy among lineages (the assumption of a molecular clock) and proportionality of branch lengths for different genes. The example data from a segment of the mitochondrial genome of six hominoid species (human, common and pygmy chimpanzees, gorilla, orangutan, and siamang) were analyzed. Nucleotides at the three codon positions in the protein-coding regions and from the tRNA-coding regions were considered heterogeneous data sets. Statistical tests showed that the amount of evolution in the sequence data reflected in the estimated branch lengths can be explained by the codon-position effect and lineage effect of substitution rates. The assumption of a molecular clock could not be rejected when the data were analyzed separately or when the rate variation among sites was ignored. However, significant differences in substitution rate among lineages were found when the data sets were combined and when the rate variation among sites was accounted for in the models. Under the assumption that the orangutan and African apes diverged 13 million years ago, the combined analysis of the sequence data estimated the times for the human-chimpanzee separation and for the separation of the gorilla as 4.3 and 6.8 million years ago, respectively.

摘要

摘要。构建了核苷酸替换模型,用于对来自同一组物种的异质序列数据(如多个基因的数据)进行联合分析。这些模型考虑了不同基因进化过程中异质性的不同方面,如核苷酸频率差异、替换率偏差(例如,转换/颠换率偏差)以及位点间速率变化程度。通过最大似然法估计模型参数,并使用似然比检验来检验有关序列进化的假设,如谱系间的速率恒定性(分子钟假设)以及不同基因分支长度的比例关系。对六个类人猿物种(人类、普通黑猩猩和倭黑猩猩、大猩猩、猩猩和合趾猿)线粒体基因组一段的示例数据进行了分析。蛋白质编码区三个密码子位置以及tRNA编码区的核苷酸被视为异质数据集。统计检验表明,估计分支长度所反映的序列数据进化量可以由替换率的密码子位置效应和谱系效应来解释。当分别分析数据或忽略位点间的速率变化时,分子钟假设不能被拒绝。然而,当合并数据集并在模型中考虑位点间的速率变化时,发现谱系间的替换率存在显著差异。在猩猩与非洲猿在1300万年前分化的假设下,对序列数据的联合分析估计人类与黑猩猩分离的时间以及大猩猩分离的时间分别为430万年前和680万年前。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验