Shanghai Jiao Tong University, Department of Bioinformatics and Biostatistics, Shanghai, 200240, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab276.
With the development of genome-wide association studies, how to gain information from a large scale of data has become an issue of common concern, since traditional methods are not fully developed to solve problems such as identifying loci-to-loci interactions (also known as epistasis). Previous epistatic studies mainly focused on local information with a single outcome (phenotype), while in this paper, we developed a two-stage global search algorithm, Greedy Equivalence Search with Local Modification (GESLM), to implement a global search of directed acyclic graph in order to identify genome-wide epistatic interactions with multiple outcome variables (phenotypes) in a case-control design. GESLM integrates the advantages of score-based methods and constraint-based methods to learn the phenotype-related Bayesian network and is powerful and robust to find the interaction structures that display both genetic associations with phenotypes and gene interactions. We compared GESLM with some common phenotype-related loci detecting methods in simulation studies. The results showed that our method improved the accuracy and efficiency compared with others, especially in an unbalanced case-control study. Besides, its application on the UK Biobank dataset suggested that our algorithm has great performance when handling genome-wide association data with more than one phenotype.
随着全基因组关联研究的发展,如何从大规模数据中获取信息已成为一个共同关注的问题,因为传统方法尚未完全发展到足以解决诸如识别基因座间相互作用(也称为上位性)等问题。先前的上位性研究主要集中于具有单一结果(表型)的局部信息,而在本文中,我们开发了一种两阶段全局搜索算法,即具有局部修改的贪婪等价搜索(Greedy Equivalence Search with Local Modification,GESLM),以实现有向无环图的全局搜索,从而识别病例对照设计中具有多个结果变量(表型)的全基因组上位性相互作用。GESLM 集成了基于评分的方法和基于约束的方法的优势,以学习与表型相关的贝叶斯网络,并且对于发现显示与表型相关的遗传关联和基因相互作用的相互作用结构具有强大的稳健性。我们在模拟研究中比较了 GESLM 与一些常见的与表型相关的基因座检测方法。结果表明,与其他方法相比,我们的方法在准确性和效率方面都有所提高,尤其是在不平衡的病例对照研究中。此外,它在英国生物库数据集上的应用表明,我们的算法在处理具有多个表型的全基因组关联数据时具有出色的性能。