• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

连续时间马尔可夫链充分统计量条件期望计算方法比较。

Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains.

机构信息

Bioinformatics Research Center, Aarhus University, Aarhus, Denmark.

出版信息

BMC Bioinformatics. 2011 Dec 5;12:465. doi: 10.1186/1471-2105-12-465.

DOI:10.1186/1471-2105-12-465
PMID:22142146
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3329461/
Abstract

BACKGROUND

Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present.

RESULTS

We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at http://www.birc.au.dk/~paula/.

CONCLUSIONS

We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.

摘要

背景

连续时间马尔可夫链(CTMC)是一种广泛用于描述核苷酸、氨基酸或密码子水平上 DNA 序列演变的模型。CTMC 的充分统计量是在一个状态中花费的时间和在任何两个状态之间发生的变化次数。在应用中,过去的进化事件(确切的时间和变化类型)是无法访问的,必须从当前观察到的 DNA 序列数据中推断过去。

结果

我们描述并实现了三种算法,用于计算充分统计量的线性组合,条件是链的终点,并比较它们在准确性和运行时间方面的性能。第一种算法基于速率矩阵的特征值分解(EVD),第二种算法基于均匀化(UNI),第三种算法基于矩阵指数的积分(EXPM)。算法在 R 中的实现可在 http://www.birc.au.dk/~paula/ 获得。

结论

我们使用两种不同的模型来分析准确性,并进行了八项实验来研究三种算法的速度。我们发现它们具有相似的准确性,并且 EXPM 是最慢的方法。此外,我们发现 UNI 通常比 EVD 更快。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/bfc3e84a1cee/1471-2105-12-465-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/a6c4d8b17cb0/1471-2105-12-465-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/b7d14e0a5549/1471-2105-12-465-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/bfc3e84a1cee/1471-2105-12-465-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/a6c4d8b17cb0/1471-2105-12-465-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/b7d14e0a5549/1471-2105-12-465-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c543/3329461/bfc3e84a1cee/1471-2105-12-465-3.jpg

相似文献

1
Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains.连续时间马尔可夫链充分统计量条件期望计算方法比较。
BMC Bioinformatics. 2011 Dec 5;12:465. doi: 10.1186/1471-2105-12-465.
2
Markov-modulated Markov chains and the covarion process of molecular evolution.马尔可夫调制马尔可夫链与分子进化的协变过程
J Comput Biol. 2004;11(4):727-33. doi: 10.1089/cmb.2004.11.727.
3
Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models.马尔可夫过程抽样实现的均匀化:在密码子替换模型贝叶斯实现中的应用。
Bioinformatics. 2008 Jan 1;24(1):56-62. doi: 10.1093/bioinformatics/btm532. Epub 2007 Nov 14.
4
Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation.利用均匀化和数据增强在系统发育树上推断复杂的DNA替代过程。
Syst Biol. 2006 Apr;55(2):259-69. doi: 10.1080/10635150500541599.
5
Pathological rate matrices: from primates to pathogens.病理速率矩阵:从灵长类到病原体
BMC Bioinformatics. 2008 Dec 19;9:550. doi: 10.1186/1471-2105-9-550.
6
Sufficient statistics and expectation maximization algorithms in phylogenetic tree models.系统发育树模型中的充分统计量和期望最大化算法。
Bioinformatics. 2011 Sep 1;27(17):2346-53. doi: 10.1093/bioinformatics/btr420. Epub 2011 Jul 14.
7
An algorithm for progressive multiple alignment of sequences with insertions.一种用于含插入序列的渐进多序列比对算法。
Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62. doi: 10.1073/pnas.0409137102. Epub 2005 Jul 6.
8
Gene finding with a hidden Markov model of genome structure and evolution.基于基因组结构与进化的隐马尔可夫模型进行基因发现。
Bioinformatics. 2003 Jan 22;19(2):219-27. doi: 10.1093/bioinformatics/19.2.219.
9
Codon substitution models based on residue similarity and their applications.基于残基相似性的密码子替换模型及其应用。
Gene. 2012 Nov 1;509(1):136-41. doi: 10.1016/j.gene.2012.07.075. Epub 2012 Aug 10.
10
zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm.zipHMMlib:一个高度优化的 HMM 库,利用输入中的重复项来加速前向算法。
BMC Bioinformatics. 2013 Nov 22;14:339. doi: 10.1186/1471-2105-14-339.

引用本文的文献

1
Interlocus Gene Conversion, Natural Selection, and Paralog Homogenization.基因座间基因转换、自然选择与旁系同源物的均一化
Mol Biol Evol. 2023 Sep 1;40(9). doi: 10.1093/molbev/msad198.
2
The Structural Determinants of Intra-Protein Compensatory Substitutions.蛋白质内补偿性替换的结构决定因素。
Mol Biol Evol. 2022 Apr 11;39(4). doi: 10.1093/molbev/msac063.
3
Unbiased Estimate of Synonymous and Nonsynonymous Substitution Rates with Nonstationary Base Composition.具有非平稳碱基组成的同义替换率和非同义替换率的无偏估计

本文引用的文献

1
MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.MEGA5:用于最大似然法、进化距离法和最大简约法的分子进化遗传学分析。
Mol Biol Evol. 2011 Oct;28(10):2731-9. doi: 10.1093/molbev/msr121. Epub 2011 May 4.
2
Learning to count: robust estimates for labeled distances between molecular sequences.学习计数:分子序列间标记距离的稳健估计
Mol Biol Evol. 2009 Apr;26(4):801-14. doi: 10.1093/molbev/msp003. Epub 2009 Jan 8.
3
Fast, accurate and simulation-free stochastic mapping.
Mol Biol Evol. 2018 Mar 1;35(3):734-742. doi: 10.1093/molbev/msx308.
4
A Phylogenetic Approach Finds Abundant Interlocus Gene Conversion in Yeast.一种系统发育方法发现酵母中存在大量基因座间基因转换。
Mol Biol Evol. 2016 Sep;33(9):2469-76. doi: 10.1093/molbev/msw114. Epub 2016 Jun 13.
5
Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression.用于疾病进展的连续时间隐马尔可夫模型的高效学习
Adv Neural Inf Process Syst. 2015;28:3599-3607.
6
Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.评估用于推断核苷酸替换非平稳模式的祖先序列重建方法。
Genetics. 2015 Jul;200(3):873-90. doi: 10.1534/genetics.115.177386. Epub 2015 May 6.
7
Relaxing the Molecular Clock to Different Degrees for Different Substitution Types.针对不同的替换类型,以不同程度放宽分子钟。
Mol Biol Evol. 2015 Aug;32(8):1948-61. doi: 10.1093/molbev/msv099. Epub 2015 Apr 29.
8
Prediction of contact residue pairs based on co-substitution between sites in protein structures.基于蛋白质结构中位点间的共取代预测接触残基对。
PLoS One. 2013;8(1):e54252. doi: 10.1371/journal.pone.0054252. Epub 2013 Jan 16.
9
Fast and robust characterization of time-heterogeneous sequence evolutionary processes using substitution mapping.利用替换映射快速稳健地刻画时变序列进化过程。
PLoS One. 2012;7(3):e33852. doi: 10.1371/journal.pone.0033852. Epub 2012 Mar 27.
快速、准确且无需模拟的随机映射。
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3985-95. doi: 10.1098/rstb.2008.0176.
4
Detecting site-specific biochemical constraints through substitution mapping.通过替换映射检测位点特异性生化限制。
J Mol Evol. 2008 Sep;67(3):257-65. doi: 10.1007/s00239-008-9139-8. Epub 2008 Aug 12.
5
Detecting groups of coevolving positions in a molecule: a clustering approach.检测分子中共同进化位点的群组:一种聚类方法。
BMC Evol Biol. 2007 Nov 30;7:242. doi: 10.1186/1471-2148-7-242.
6
Counting labeled transitions in continuous-time Markov models of evolution.计算进化的连续时间马尔可夫模型中的标记转移
J Math Biol. 2008 Mar;56(3):391-412. doi: 10.1007/s00285-007-0120-8. Epub 2007 Sep 14.
7
An empirical codon model for protein sequence evolution.一种用于蛋白质序列进化的经验密码子模型。
Mol Biol Evol. 2007 Jul;24(7):1464-79. doi: 10.1093/molbev/msm064. Epub 2007 Mar 30.
8
XRate: a fast prototyping, training and annotation tool for phylo-grammars.XRate:一种用于系统发育语法的快速原型制作、训练和注释工具。
BMC Bioinformatics. 2006 Oct 3;7:428. doi: 10.1186/1471-2105-7-428.
9
Statistical inference in evolutionary models of DNA sequences via the EM algorithm.通过期望最大化(EM)算法对DNA序列进化模型进行统计推断。
Stat Appl Genet Mol Biol. 2005;4:Article18. doi: 10.2202/1544-6115.1127. Epub 2005 Aug 12.
10
Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation.利用均匀化和数据增强在系统发育树上推断复杂的DNA替代过程。
Syst Biol. 2006 Apr;55(2):259-69. doi: 10.1080/10635150500541599.