Suppr超能文献

使用具有不相交集数据结构的线性系统,从存在缺失数据的家系中高效推断单倍型。

Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures.

作者信息

Li Xin, Li Jing

机构信息

Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.

出版信息

Comput Syst Bioinformatics Conf. 2008;7:297-308.

Abstract

We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e., single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O (mn x alpha(n)), where m is the number of loci, n is the number of individuals and alpha is the inverse Ackermann function, which is a further improvement over existing ones. We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 10(5)-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.

摘要

我们研究了在零重组假设下从家系数据中进行单倍型推断的问题,对于相对较大染色体片段上紧密连锁的标记(即单核苷酸多态性(SNP)),真实数据对该假设提供了有力支持。我们通过将基因型约束表述为遗传变量的线性系统,以严格的数学方式解决该问题。然后,我们利用并查集结构来编码个体之间的连通性信息,检测来自基因型的约束,并检查约束的一致性。对于无缺失数据的树状家系,我们的算法能够在接近线性时间O(mn×α(n))内输出一般解以及总特定解的数量,其中m是基因座数量,n是个体数量,α是阿克曼函数的反函数,这是相对于现有算法的进一步进一步现有算法的进一步改进。我们还通过考虑遗传变量上现有的(部分)约束,将该思想扩展到环状家系和有缺失数据的家系。该算法已用C++实现,并将被纳入我们的PedPhase软件包中。实验结果表明,它能够高效地正确识别所有零重组解。与其他两种流行算法的比较表明,在各种参数设置下,所提出的算法实现了10到10⁵倍的改进。实验研究还为理论分析所建议的复杂度界限提供了经验证据。

相似文献

5
Efficient inference of haplotypes from genotypes on a pedigree.从系谱中的基因型高效推断单倍型。
J Bioinform Comput Biol. 2003 Apr;1(1):41-69. doi: 10.1142/s0219720003000204.
7
Estimate haplotype frequencies in pedigrees.估计系谱中的单倍型频率。
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-7-S4-S5.

引用本文的文献

本文引用的文献

2
A survey on haplotyping algorithms for tightly linked markers.紧密连锁标记的单倍型分型算法调查
J Bioinform Comput Biol. 2008 Feb;6(1):241-59. doi: 10.1142/s0219720008003369.
8
Efficient inference of haplotypes from genotypes on a pedigree.从系谱中的基因型高效推断单倍型。
J Bioinform Comput Biol. 2003 Apr;1(1):41-69. doi: 10.1142/s0219720003000204.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验