Suppr超能文献

与允许序列沿序列存在速率异质性的序列进化模型进行统计比对。

Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence.

作者信息

Arribas-Gil Ana, Metzler Dirk, Plouhinec Jean-Louis

机构信息

Departamento de Estadística, Universidad Carlos III de Madrid, Getafe, Spain.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):281-95. doi: 10.1109/TCBB.2007.70246.

Abstract

We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data.

摘要

我们提出了一种随机序列进化模型,用于获得比对结果并估计两个同源序列之间的突变率。该模型允许沿着DNA序列出现两种可能的进化行为,以便确定保守区域并考虑其异质性。在我们的模型中,序列被分为缓慢进化区域和快速进化区域。这些区域之间的边界是未知的。我们的目标是检测它们。进化模型基于仅在快速区域起作用的片段插入和删除过程,以及在快速和缓慢区域以不同速率起作用的替换过程。该模型在比对层面诱导出一对隐藏马尔可夫结构,从而使高效的统计算法成为可能。我们提出了两种互补的估计方法,即用于贝叶斯估计的吉布斯采样器和用于最大似然估计的EM算法的随机版本。两种算法都涉及比对的采样。我们提出了一种部分比对采样器,其计算成本低于典型的全比对采样器。我们展示了这两种估计算法与这种部分采样器一起使用时的收敛性。我们的算法为突变率提供了一致的估计,并在模拟数据和真实数据上给出了合理的比对结果和序列分割。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验