系统发育推断中不完全比对的层次模型。

A hierarchical model for incomplete alignments in phylogenetic inference.

作者信息

Cheng Fuxia, Hartmann Stefanie, Gupta Mayetri, Ibrahim Joseph G, Vision Todd J

机构信息

Department of Mathematics, Illinois State University, Normal, IL, USA.

出版信息

Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.

DOI:10.1093/bioinformatics/btp015

PMID:19147663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2647833/

Abstract

MOTIVATION

Full-length DNA and protein sequences that span the entire length of a gene are ideally used for multiple sequence alignments (MSAs) and the subsequent inference of their relationships. Frequently, however, MSAs contain a substantial amount of missing data. For example, expressed sequence tags (ESTs), which are partial sequences of expressed genes, are the predominant source of sequence data for many organisms. The patterns of missing data typical for EST-derived alignments greatly compromise the accuracy of estimated phylogenies.

RESULTS

We present a statistical method for inferring phylogenetic trees from EST-based incomplete MSA data. We propose a class of hierarchical models for modeling pairwise distances between the sequences, and develop a fully Bayesian approach for estimation of the model parameters. Once the distance matrix is estimated, the phylogenetic tree may be constructed by applying neighbor-joining (or any other algorithm of choice). We also show that maximizing the marginal likelihood from the Bayesian approach yields similar results to a profile likelihood estimation. The proposed methods are illustrated using simulated protein families, for which the true phylogeny is known, and one real protein family.

AVAILABILITY

R code for fitting these models are available from: http://people.bu.edu/gupta/software.htm.

摘要

动机

跨越基因全长的完整DNA和蛋白质序列最适合用于多序列比对（MSA）以及后续关系推断。然而，MSA常常包含大量缺失数据。例如，表达序列标签（EST）作为已表达基因的部分序列，是许多生物体序列数据的主要来源。源自EST比对的典型缺失数据模式极大地损害了估计系统发育树的准确性。

结果

我们提出了一种从基于EST的不完整MSA数据推断系统发育树的统计方法。我们提出了一类用于对序列间成对距离进行建模的层次模型，并开发了一种用于估计模型参数的全贝叶斯方法。一旦估计出距离矩阵，就可以通过应用邻接法（或任何其他选择的算法）构建系统发育树。我们还表明，从贝叶斯方法中最大化边际似然会产生与轮廓似然估计相似的结果。使用已知真实系统发育关系的模拟蛋白质家族以及一个真实蛋白质家族对所提出的方法进行了说明。

可用性

用于拟合这些模型的R代码可从以下网址获取：http://people.bu.edu/gupta/software.htm。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

系统发育推断中不完全比对的层次模型。

A hierarchical model for incomplete alignments in phylogenetic inference.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

系统发育推断中不完全比对的层次模型。

A hierarchical model for incomplete alignments in phylogenetic inference.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献