IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):638-649. doi: 10.1109/TCBB.2017.2786239. Epub 2017 Dec 22.
Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability distribution of the model parameters. We have implemented our proposed framework on a software package whose performance is extensively evaluated on a number of synthetic datasets, and compared to some of the well-known existing methods such as STRUCTURE. It has been shown that in extreme scenarios, up to $10-15$10-15 percent of improvement in the inference accuracy is achieved with a moderate increase in computational complexity.
近年来,遗传疾病的关联映射吸引了广泛的研究兴趣。然而,到目前为止,大多数介绍的方法由于群体异质性而导致关联位点的虚假推断。在本文中,我们通过为当前方法配备一种广泛应用于群体遗传学应用的最先进聚类算法来弥补这一缺陷,从而引入了一种统计框架。所提出的框架共同推断疾病相关因素和隐藏的群体结构。在这方面,采用了马尔可夫链-蒙特卡罗(MCMC)过程来评估模型参数的后验概率分布。我们已经在一个软件包上实现了我们提出的框架,该框架的性能在许多合成数据集上进行了广泛评估,并与 STRUCTURE 等一些知名的现有方法进行了比较。结果表明,在极端情况下,通过适度增加计算复杂度,可以实现高达 10-15%的推断准确性的提高。