Suppr超能文献

用于在系谱上重建零重组单倍型结构的线性时间算法。

A linear-time algorithm for reconstructing zero-recombinant haplotype configuration on a pedigree.

机构信息

Institute of Biomedical Informatics, National Yang Ming University, Taipei 112, Taiwan.

出版信息

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S19. doi: 10.1186/1471-2105-13-S17-S19. Epub 2012 Dec 13.

Abstract

BACKGROUND

When studying genetic diseases in which genetic variations are passed on to offspring, the ability to distinguish between paternal and maternal alleles is essential. Determining haplotypes from genotype data is called haplotype inference. Most existing computational algorithms for haplotype inference have been designed to use genotype data collected from individuals in the form of a pedigree. A haplotype is regarded as a hereditary unit and therefore input pedigrees are preferred that are free of mutational events and have a minimum number of genetic recombinational events. These ideas motivated the zero-recombinant haplotype configuration (ZRHC) problem, which strictly follows the Mendelian law of inheritance, namely that one haplotype of each child is inherited from the father and the other haplotype is inherited from the mother, both without any mutation. So far no linear-time algorithm for ZRHC has been proposed for general pedigrees, even though the number of mating loops in a human pedigree is usually very small and can be regarded as constant.

RESULTS

Given a pedigree with n individuals, m marker loci, and k mating loops, we proposed an algorithm that can provide a general solution to the zero-recombinant haplotype configuration problem in O(kmn + k2m) time. In addition, this algorithm can be modified to detect inconsistencies within the genotype data without loss of efficiency. The proposed algorithm was subject to 12000 experiments to verify its performance using different (n, m) combinations. The value of k was uniformly distributed between zero and six throughout all experiments. The experimental results show a great linearity in terms of execution time in relation to input size when both n and m are larger than 100. For those experiments where n or m are less than 100, the proposed algorithm runs very fast, in thousandth to hundredth of a second, on a personal desktop computer.

CONCLUSIONS

We have developed the first deterministic linear-time algorithm for the zero-recombinant haplotype configuration problem. Our experimental results demonstrated the linearity of its execution time in relation to the input size. The proposed algorithm can be modified to detect inconsistency within the genotype data without loss of efficiency and is expected to be able to handle recombinant and missing data with further extension.

摘要

背景

在研究遗传疾病时,遗传变异会遗传给后代,因此区分父本和母本等位基因的能力至关重要。从基因型数据中推断出单倍型称为单倍型推断。大多数现有的用于单倍型推断的计算算法都是为使用以谱系形式收集的个体的基因型数据而设计的。单倍型被视为遗传单位,因此输入的谱系最好没有突变事件,并且具有最小数量的遗传重组事件。这些想法激发了零重组单倍型构型(ZRHC)问题的产生,该问题严格遵循孟德尔遗传定律,即每个孩子的一个单倍型来自父亲,另一个单倍型来自母亲,两者都没有任何突变。到目前为止,即使人类谱系中的交配环数量通常非常少,可以视为常数,但尚未提出针对一般谱系的 ZRHC 的线性时间算法。

结果

对于具有 n 个人、m 个标记基因座和 k 个交配环的谱系,我们提出了一种算法,可以在 O(kmn + k2m)时间内为零重组单倍型构型问题提供通用解决方案。此外,该算法可以修改为在不降低效率的情况下检测基因型数据中的不一致性。所提出的算法经过 12000 次实验验证了其使用不同(n,m)组合的性能。在所有实验中,k 的值在零到六之间均匀分布。实验结果表明,当 n 和 m 都大于 100 时,执行时间与输入大小之间具有很好的线性关系。对于那些 n 或 m 小于 100 的实验,所提出的算法在个人台式计算机上的运行速度非常快,只需千分之一到百分之一秒。

结论

我们已经开发了第一个用于零重组单倍型构型问题的确定性线性时间算法。我们的实验结果表明,其执行时间与输入大小之间具有线性关系。可以修改所提出的算法以检测基因型数据中的不一致性而不会降低效率,并且有望通过进一步扩展能够处理重组和缺失数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2233/3521470/423bf70c188d/1471-2105-13-S17-S19-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验