IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2654-2671. doi: 10.1109/TCBB.2021.3092719. Epub 2022 Oct 10.
Proposing a more effective and accurate epistatic loci detection method in large-scale genomic data has important research significance for improving crop quality, disease treatment, etc. Due to the characteristics of high accuracy and processing non-linear relationship, Bayesian network (BN) has been widely used in constructing the network of SNPs and phenotype traits and thus to mine epistatic loci. However, the shortcoming of BN is that it is easy to fall into local optimum and unable to process large-scale of SNPs. In this work, we transform the problem of learning Bayesian network into the optimization of integer linear programming (ILP). We use the algorithms of branch-and-bound and cutting planes to get the global optimal Bayesian network (ILPBN), and thus to get epistatic loci influencing specific phenotype traits. In order to handle large-scale of SNP loci and further to improve efficiency, we use the method of optimizing Markov blanket to reduce the number of candidate parent nodes for each node. In addition, we use α-BIC that is suitable for processing the epistatis mining to calculate the BN score. We use four properties of BN decomposable scoring functions to further reduce the number of candidate parent sets for each node. Experiment results show that ILPBN can not only process 2-locus and 3-locus epistasis mining, but also realize multi-locus epistasis detection. Finally, we compare ILPBN with several popular epistasis mining algorithms by using simulated and real Age-related macular disease (AMD) dataset. Experiment results show that ILPBN has better epistasis detection accuracy, F1-score and false positive rate in premise of ensuring the efficiency compared with other methods. Availability: Codes and dataset are available at: http://122.205.95.139/ILPBN/.
提出一种更有效、更准确的大规模基因组数据上位点检测方法,对于提高作物质量、疾病治疗等具有重要的研究意义。由于贝叶斯网络(BN)具有高精度和处理非线性关系的特点,已被广泛应用于构建 SNP 与表型性状的网络,从而挖掘上位点。然而,BN 的缺点是容易陷入局部最优,无法处理大规模的 SNP。在这项工作中,我们将学习贝叶斯网络的问题转化为整数线性规划(ILP)的优化问题。我们使用分支定界和割平面算法得到全局最优贝叶斯网络(ILPBN),从而得到影响特定表型性状的上位点。为了处理大规模的 SNP 位点,进一步提高效率,我们使用优化马克夫毯的方法来减少每个节点的候选父节点数量。此外,我们使用适合处理上位挖掘的α-BIC 来计算 BN 得分。我们使用 BN 可分解评分函数的四个性质进一步减少每个节点的候选父节点集的数量。实验结果表明,ILPBN 不仅可以处理 2 位和 3 位上位互作挖掘,还可以实现多位上位互作检测。最后,我们通过使用模拟和真实的年龄相关性黄斑变性(AMD)数据集,将 ILPBN 与几种流行的上位挖掘算法进行比较。实验结果表明,与其他方法相比,ILPBN 在保证效率的前提下,具有更好的上位检测准确性、F1 得分和假阳性率。可获取性:代码和数据集可在以下网址获取:http://122.205.95.139/ILPBN/。