Schouten Michael T, Williams Christopher K I, Haley Chris S
School of Informatics, University of Edinburgh, Edinburgh EH1 2QL, United Kingdom.
Genetics. 2005 Nov;171(3):1321-30. doi: 10.1534/genetics.105.042762. Epub 2005 Jun 8.
Recent studies have highlighted the dangers of using haplotypes reconstructed directly from population data for a fine-scale mapping analysis. Family data may help resolve ambiguity, yet can be costly to obtain. This study is concerned with the following question: How much family data (if any) should be used to facilitate haplotype reconstruction in a population study? We conduct a simulation study to evaluate how changes in family information can affect the accuracy of haplotype frequency estimates and phase reconstruction. To reconstruct haplotypes, we introduce an EM-based algorithm that can efficiently accommodate unrelated individuals, parent-child trios, and arbitrarily large half-sib pedigrees. Simulations are conducted for a diverse set of haplotype frequency distributions, all of which have been previously published in empirical studies. A wide variety of important results regarding the effectiveness of using pedigree data in a population study are presented in a coherent, unified framework. Insight into the different properties of the haplotype frequency distribution that can influence experimental design is provided. We show that a preliminary estimate of the haplotype frequency distribution can be valuable in large population studies with fixed resources.
最近的研究强调了直接从群体数据重建单倍型用于精细定位分析的危险性。家系数据可能有助于解决模糊性问题,但获取成本可能很高。本研究关注以下问题:在群体研究中,应该使用多少家系数据(如果有的话)来促进单倍型重建?我们进行了一项模拟研究,以评估家系信息的变化如何影响单倍型频率估计和相位重建的准确性。为了重建单倍型,我们引入了一种基于期望最大化(EM)的算法,该算法可以有效地处理无关个体、亲子三联体以及任意大小的半同胞家系。针对各种不同的单倍型频率分布进行了模拟,所有这些分布都曾在实证研究中发表过。在一个连贯、统一的框架中展示了关于在群体研究中使用家系数据有效性的各种重要结果。提供了对可影响实验设计的单倍型频率分布不同特性的见解。我们表明,在资源固定的大型群体研究中,单倍型频率分布的初步估计可能很有价值。