Ma Yanran, Fa Botao, Yuan Xin, Zhang Yue, Yu Zhangsheng
Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Xi'an Jiaotong University, Xi'an, China.
Front Genet. 2022 Sep 15;13:942464. doi: 10.3389/fgene.2022.942464. eCollection 2022.
The identification of the causal SNPs of complex diseases in large-scale genome-wide association analysis is beneficial to the studies of pathogenesis, prevention, diagnosis and treatment of these diseases. However, existing applicable methods for large-scale data suffer from low accuracy. Developing powerful and accurate methods for detecting SNPs associated with complex diseases is highly desired. We propose a score-based two-stage Bayesian network method to identify causal SNPs of complex diseases for case-control designs. This method combines the ideas of constraint-based methods and score-and-search methods to learn the structure of the disease-centered local Bayesian network. Simulation experiments are conducted to compare this new algorithm with several common methods that can achieve the same function. The results show that our method improves the accuracy and stability compared to several common methods. Our method based on Bayesian network theory results in lower false-positive rates when all correct loci are detected. Besides, real-world data application suggests that our algorithm has good performance when handling genome-wide association data. The proposed method is designed to identify the SNPs related to complex diseases, and is more accurate than other methods which can also be adapted to large-scale genome-wide analysis studies data.
在大规模全基因组关联分析中识别复杂疾病的因果单核苷酸多态性(SNP),有利于这些疾病的发病机制、预防、诊断和治疗研究。然而,现有的适用于大规模数据的方法准确性较低。因此,迫切需要开发强大且准确的方法来检测与复杂疾病相关的SNP。我们提出了一种基于评分的两阶段贝叶斯网络方法,用于在病例对照设计中识别复杂疾病的因果SNP。该方法结合了基于约束的方法和评分搜索方法的思想,以学习以疾病为中心的局部贝叶斯网络的结构。进行了模拟实验,将这种新算法与几种能实现相同功能的常用方法进行比较。结果表明,与几种常用方法相比,我们的方法提高了准确性和稳定性。当检测到所有正确位点时,基于贝叶斯网络理论的方法产生的假阳性率更低。此外,实际数据应用表明,我们的算法在处理全基因组关联数据时具有良好的性能。所提出的方法旨在识别与复杂疾病相关的SNP,并且比其他也可适用于大规模全基因组分析研究数据的方法更准确。