Yuan Ao, Chen Guanjie, Rotimi Charles, Bonney George E
Statistical Genetics and Bioinformatics Unit, Howard University, Washington, DC 20059, USA.
J Bioinform Comput Biol. 2005 Oct;3(5):1021-38. doi: 10.1142/s021972000500151x.
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.
最近有人提出存在从父母传递给后代的单倍型块。这引发了对块结构和长度推断的兴趣。其动机在于,特征明确的单倍型块将使相对更容易快速定位所有携带人类疾病的基因。为了系统地研究单倍型块的推断,我们提出了一个统计框架。在此框架中,最优单倍型块划分被表述为统计模型选择问题;缺失数据可以用标准统计方法处理;群体分层可以实现;块结构推断/假设检验可以进行;如果有先验知识,可以纳入以进行贝叶斯推断。该算法在位点数上是线性的,而不像许多此类算法那样是NP难的。我们说明了我们的方法在模拟数据集和真实数据集上的应用。