Suppr超能文献

DNA序列进化的混合模型分析

Mixed model analysis of DNA sequence evolution.

作者信息

Yang Z, Wang T

机构信息

Department of Zoology, University of Cambridge, United Kingdom.

出版信息

Biometrics. 1995 Jun;51(2):552-61.

PMID:7662844
Abstract

Nucleotides in a DNA sequence may be changing at different rates, because they are located in different structural and functional regions of the gene, and are thus subject to different mutational pressures or selective restrictions. Knowledge of substitution rates at specific sites is important for understanding the forces and mechanisms that have shaped the evolution of the DNA sequences. The gamma distribution has previously been proposed to model such rate variation among nucleotide sites. Based on mixed model methodology we present in this paper a method for predicting substitution rates at nucleotide sites by using homologous DNA sequences. The predictor is unbiased and "best" in the sense that it minimizes the mean squared error and maximizes the correlation between the predictor and the true value. It is also quite robust to errors in estimates of parameters in the model. A numerical example is given, with guidelines for the practical use of the approach. The most influential factor affecting the accuracy of prediction is the number of sequences; to get a correlation of over .7 between the predictor and the true value, about six to seven sequences are needed, depending on the overall similarity of the sequences.

摘要

DNA序列中的核苷酸可能以不同的速率发生变化,因为它们位于基因的不同结构和功能区域,因此受到不同的突变压力或选择限制。了解特定位点的替换率对于理解塑造DNA序列进化的力量和机制很重要。此前有人提出用伽马分布来模拟核苷酸位点间的这种速率变化。基于混合模型方法,我们在本文中提出了一种利用同源DNA序列预测核苷酸位点替换率的方法。该预测器是无偏的且是“最佳的”,因为它使均方误差最小化,并使预测器与真实值之间的相关性最大化。它对模型参数估计中的误差也相当稳健。给出了一个数值示例以及该方法实际应用的指导原则。影响预测准确性的最主要因素是序列数量;要使预测器与真实值之间的相关性超过0.7,根据序列的总体相似性,大约需要六到七个序列。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验