Suppr超能文献

系统发育推断中不完全比对的层次模型。

A hierarchical model for incomplete alignments in phylogenetic inference.

作者信息

Cheng Fuxia, Hartmann Stefanie, Gupta Mayetri, Ibrahim Joseph G, Vision Todd J

机构信息

Department of Mathematics, Illinois State University, Normal, IL, USA.

出版信息

Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.

Abstract

MOTIVATION

Full-length DNA and protein sequences that span the entire length of a gene are ideally used for multiple sequence alignments (MSAs) and the subsequent inference of their relationships. Frequently, however, MSAs contain a substantial amount of missing data. For example, expressed sequence tags (ESTs), which are partial sequences of expressed genes, are the predominant source of sequence data for many organisms. The patterns of missing data typical for EST-derived alignments greatly compromise the accuracy of estimated phylogenies.

RESULTS

We present a statistical method for inferring phylogenetic trees from EST-based incomplete MSA data. We propose a class of hierarchical models for modeling pairwise distances between the sequences, and develop a fully Bayesian approach for estimation of the model parameters. Once the distance matrix is estimated, the phylogenetic tree may be constructed by applying neighbor-joining (or any other algorithm of choice). We also show that maximizing the marginal likelihood from the Bayesian approach yields similar results to a profile likelihood estimation. The proposed methods are illustrated using simulated protein families, for which the true phylogeny is known, and one real protein family.

AVAILABILITY

R code for fitting these models are available from: http://people.bu.edu/gupta/software.htm.

摘要

动机

跨越基因全长的完整DNA和蛋白质序列最适合用于多序列比对(MSA)以及后续关系推断。然而,MSA常常包含大量缺失数据。例如,表达序列标签(EST)作为已表达基因的部分序列,是许多生物体序列数据的主要来源。源自EST比对的典型缺失数据模式极大地损害了估计系统发育树的准确性。

结果

我们提出了一种从基于EST的不完整MSA数据推断系统发育树的统计方法。我们提出了一类用于对序列间成对距离进行建模的层次模型,并开发了一种用于估计模型参数的全贝叶斯方法。一旦估计出距离矩阵,就可以通过应用邻接法(或任何其他选择的算法)构建系统发育树。我们还表明,从贝叶斯方法中最大化边际似然会产生与轮廓似然估计相似的结果。使用已知真实系统发育关系的模拟蛋白质家族以及一个真实蛋白质家族对所提出的方法进行了说明。

可用性

用于拟合这些模型的R代码可从以下网址获取:http://people.bu.edu/gupta/software.htm。

相似文献

1
A hierarchical model for incomplete alignments in phylogenetic inference.系统发育推断中不完全比对的层次模型。
Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

本文引用的文献

1
Combining data in phylogenetic analysis.在系统发育分析中合并数据。
Trends Ecol Evol. 1996 Apr;11(4):152-8. doi: 10.1016/0169-5347(96)10006-9.
5
9
The evolution of supertrees.超级树的演化
Trends Ecol Evol. 2004 Jun;19(6):315-22. doi: 10.1016/j.tree.2004.03.015.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验