• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

系统发育推断中不完全比对的层次模型。

A hierarchical model for incomplete alignments in phylogenetic inference.

作者信息

Cheng Fuxia, Hartmann Stefanie, Gupta Mayetri, Ibrahim Joseph G, Vision Todd J

机构信息

Department of Mathematics, Illinois State University, Normal, IL, USA.

出版信息

Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.

DOI:10.1093/bioinformatics/btp015
PMID:19147663
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2647833/
Abstract

MOTIVATION

Full-length DNA and protein sequences that span the entire length of a gene are ideally used for multiple sequence alignments (MSAs) and the subsequent inference of their relationships. Frequently, however, MSAs contain a substantial amount of missing data. For example, expressed sequence tags (ESTs), which are partial sequences of expressed genes, are the predominant source of sequence data for many organisms. The patterns of missing data typical for EST-derived alignments greatly compromise the accuracy of estimated phylogenies.

RESULTS

We present a statistical method for inferring phylogenetic trees from EST-based incomplete MSA data. We propose a class of hierarchical models for modeling pairwise distances between the sequences, and develop a fully Bayesian approach for estimation of the model parameters. Once the distance matrix is estimated, the phylogenetic tree may be constructed by applying neighbor-joining (or any other algorithm of choice). We also show that maximizing the marginal likelihood from the Bayesian approach yields similar results to a profile likelihood estimation. The proposed methods are illustrated using simulated protein families, for which the true phylogeny is known, and one real protein family.

AVAILABILITY

R code for fitting these models are available from: http://people.bu.edu/gupta/software.htm.

摘要

动机

跨越基因全长的完整DNA和蛋白质序列最适合用于多序列比对(MSA)以及后续关系推断。然而,MSA常常包含大量缺失数据。例如,表达序列标签(EST)作为已表达基因的部分序列,是许多生物体序列数据的主要来源。源自EST比对的典型缺失数据模式极大地损害了估计系统发育树的准确性。

结果

我们提出了一种从基于EST的不完整MSA数据推断系统发育树的统计方法。我们提出了一类用于对序列间成对距离进行建模的层次模型,并开发了一种用于估计模型参数的全贝叶斯方法。一旦估计出距离矩阵,就可以通过应用邻接法(或任何其他选择的算法)构建系统发育树。我们还表明,从贝叶斯方法中最大化边际似然会产生与轮廓似然估计相似的结果。使用已知真实系统发育关系的模拟蛋白质家族以及一个真实蛋白质家族对所提出的方法进行了说明。

可用性

用于拟合这些模型的R代码可从以下网址获取:http://people.bu.edu/gupta/software.htm。

相似文献

1
A hierarchical model for incomplete alignments in phylogenetic inference.系统发育推断中不完全比对的层次模型。
Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.
2
Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?利用ESTs进行系统发育基因组学研究:能否从有缺口的比对中准确推断系统发育树?
BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
4
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
5
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences.基于整个质体和整个线粒体基因组序列推断的基因组BLAST距离系统发育树。
BMC Bioinformatics. 2006 Jul 19;7:350. doi: 10.1186/1471-2105-7-350.
6
Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm.使用一种基于系统发育感知图算法的多重序列比对精确扩展方法。
Bioinformatics. 2012 Jul 1;28(13):1684-91. doi: 10.1093/bioinformatics/bts198. Epub 2012 Apr 23.
7
A weighted least-squares approach for inferring phylogenies from incomplete distance matrices.一种用于从不完整距离矩阵推断系统发育树的加权最小二乘法。
Bioinformatics. 2004 Sep 1;20(13):2113-21. doi: 10.1093/bioinformatics/bth211. Epub 2004 Apr 1.
8
Protein multiple sequence alignment benchmarking through secondary structure prediction.通过二级结构预测进行蛋白质多序列比对基准测试。
Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.
9
A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.同源蛋白质的一种构象空间,其保留互信息并允许基于成对Z分数概率进行系统发育推断。
BMC Bioinformatics. 2005 Mar 10;6:49. doi: 10.1186/1471-2105-6-49.
10
Pandit: a database of protein and associated nucleotide domains with inferred trees.潘迪特:一个带有推断树的蛋白质及相关核苷酸结构域数据库。
Bioinformatics. 2003 Aug 12;19(12):1556-63. doi: 10.1093/bioinformatics/btg188.

引用本文的文献

1
PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data.PhyloMissForest:一种带有缺失数据的构建系统发育树的随机森林框架。
BMC Genomics. 2022 May 18;23(1):377. doi: 10.1186/s12864-022-08540-6.
2
Selecting informative subsets of sparse supermatrices increases the chance to find correct trees.选择稀疏超矩阵的信息子集可以增加找到正确树的机会。
BMC Bioinformatics. 2013 Dec 3;14:348. doi: 10.1186/1471-2105-14-348.

本文引用的文献

1
Combining data in phylogenetic analysis.在系统发育分析中合并数据。
Trends Ecol Evol. 1996 Apr;11(4):152-8. doi: 10.1016/0169-5347(96)10006-9.
2
A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis.一种新的有效方法,用于在进行系统发育分析之前估计序列数据中的缺失值。
Evol Bioinform Online. 2007 Feb 1;2:237-46.
3
Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?利用ESTs进行系统发育基因组学研究:能否从有缺口的比对中准确推断系统发育树?
BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.
4
Accounting for gene rate heterogeneity in phylogenetic inference.在系统发育推断中考虑基因速率异质性。
Syst Biol. 2007 Apr;56(2):194-205. doi: 10.1080/10635150701291804.
5
The molecular ecologist's guide to expressed sequence tags.分子生态学家的表达序列标签指南。
Mol Ecol. 2007 Mar;16(5):907-24. doi: 10.1111/j.1365-294X.2006.03195.x.
6
SDM: a fast distance-based approach for (super) tree building in phylogenomics.SDM:一种用于系统发育基因组学中(超)树构建的基于距离的快速方法。
Syst Biol. 2006 Oct;55(5):740-55. doi: 10.1080/10635150600969872.
7
Fast calculation of the quartet distance between trees of arbitrary degrees.快速计算任意度数树之间的四重距离。
Algorithms Mol Biol. 2006 Sep 25;1:16. doi: 10.1186/1748-7188-1-16.
8
ESTimating plant phylogeny: lessons from partitioning.估计植物系统发育:分区的经验教训。
BMC Evol Biol. 2006 Jun 15;6:48. doi: 10.1186/1471-2148-6-48.
9
The evolution of supertrees.超级树的演化
Trends Ecol Evol. 2004 Jun;19(6):315-22. doi: 10.1016/j.tree.2004.03.015.
10
Phytome: a platform for plant comparative genomics.植物基因组数据库:一个用于植物比较基因组学的平台。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D724-30. doi: 10.1093/nar/gkj045.