Suppr超能文献

从基因型数据直接进行最大简约系统发育重建。

Direct maximum parsimony phylogeny reconstruction from genotype data.

作者信息

Sridhar Srinath, Lam Fumei, Blelloch Guy E, Ravi R, Schwartz Russell

机构信息

Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA.

出版信息

BMC Bioinformatics. 2007 Dec 5;8:472. doi: 10.1186/1471-2105-8-472.

Abstract

BACKGROUND

Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes.

RESULTS

In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes.

CONCLUSION

Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.

摘要

背景

从遗传变异数据重建最大简约系统发育树是计算遗传学中的一个基本问题,在群体遗传学、全基因组分析以及寻找疾病的遗传预测因子等方面有许多实际应用。有一些有效的方法可用于从单倍型数据重建最大简约树,但对于常染色体DNA来说,直接确定这样的数据很困难。更常见的数据形式是基因型,它由来自同源染色体的单倍型对的混合组合组成。目前,还没有从基因型数据直接重建最大简约系统发育树的通用算法。因此,常染色体数据的系统发育应用必须依赖于其他方法,首先从基因型中通过计算推断单倍型。

结果

在这项工作中,我们开发了第一种直接从基因型数据计算最大简约系统发育树的实用方法。我们表明,先从基因型推断单倍型,然后在单倍型上重建系统发育树的标准做法常常会大幅高估系统发育树的规模。作为一个直接应用,我们的方法可用于确定解释给定一组观察到的基因型所需的最小突变数。

结论

对于中等规模的问题实例,直接从未分型数据重建系统发育树在计算上是可行的,并且与将分型和系统发育树构建视为两个独立分析阶段的标准做法相比,能够得出实质上更准确的树规模推断。对于需要遗传区域经历的突变数下限的下游应用,这两种方法之间的差异尤为重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa1c/2222657/f871e033e953/1471-2105-8-472-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验