Suppr超能文献

高效鉴定具有大量未分型个体的家系中同系同源状态。

Efficient identification of identical-by-descent status in pedigrees with many untyped individuals.

机构信息

Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i191-8. doi: 10.1093/bioinformatics/btq222.

Abstract

MOTIVATION

Inference of identical-by-descent (IBD) probabilities is the key in family-based linkage analysis. Using high-density single nucleotide polymorphism (SNP) markers, one can almost always infer haplotype configurations of each member in a family given all individuals being typed. Consequently, the IBD status can be obtained directly from haplotype configurations. However, in reality, many family members are not typed due to practical reasons. The problem of IBD/haplotype inference is much harder when treating untyped individuals as missing.

RESULTS

We present a novel hidden Markov model (HMM) approach to infer the IBD status in a pedigree with many untyped members using high-density SNP markers. We introduce the concept of inheritance-generating function, defined for any pair of alleles in a descent graph based on a pedigree structure. We derive a recursive formula for efficient calculation of the inheritance-generating function. By aggregating all possible inheritance patterns via an explicit representation of the number and lengths of all possible paths between two alleles, the inheritance-generating function provides a convenient way to theoretically derive the transition probabilities of the HMM. We further extend the basic HMM to incorporate population linkage disequilibrium (LD). Pedigree-wise IBD sharing can be constructed based on pair-wise IBD relationships. Compared with traditional approaches for linkage analysis, our new model can efficiently infer IBD status without enumerating all possible genotypes and transmission patterns of untyped members in a family. Our approach can be reliably applied on large pedigrees with many untyped members, and the inferred IBD status can be used for non-parametric genome-wide linkage analysis.

AVAILABILITY

The algorithm is implemented in Matlab and is freely available upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available on Bioinformatics online.

摘要

动机

在基于家系的连锁分析中,推断同系(IBD)概率是关键。使用高密度单核苷酸多态性(SNP)标记,几乎可以在对所有个体进行分型的情况下推断每个家族成员的单倍型配置。因此,可以直接从单倍型配置中获得 IBD 状态。然而,在实际中,由于实际原因,许多家庭成员没有进行分型。当将未分型个体视为缺失时,IBD/单倍型推断问题要困难得多。

结果

我们提出了一种新的隐马尔可夫模型(HMM)方法,用于使用高密度 SNP 标记推断具有许多未分型成员的家系中的 IBD 状态。我们引入了遗传生成函数的概念,该概念是基于家族结构定义的任何一对等位基因在下降图中的定义。我们推导出一种递归公式,用于有效计算遗传生成函数。通过通过两个等位基因之间所有可能路径的数量和长度的显式表示来聚合所有可能的遗传模式,遗传生成函数为通过理论推导 HMM 的转移概率提供了一种方便的方法。我们进一步将基本 HMM 扩展到包含群体连锁不平衡(LD)。可以基于两两 IBD 关系构建家系特异性 IBD 共享。与传统的连锁分析方法相比,我们的新模型可以在不枚举家系中未分型成员的所有可能基因型和传递模式的情况下,有效地推断 IBD 状态。我们的方法可以可靠地应用于具有许多未分型成员的大型家系,并且推断出的 IBD 状态可用于非参数全基因组连锁分析。

可用性

该算法用 Matlab 实现,可根据要求免费提供。

补充信息

补充信息可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad0b/2881406/aa5d724b09b8/btq222f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验