Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium.
Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium.
BMC Bioinformatics. 2023 Aug 29;24(1):324. doi: 10.1186/s12859-023-05451-5.
Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results.
We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction.
Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
越来越多的人认识到,了解基因相互作用对疾病表型的影响是遗传疾病研究的一个关键方面。这种趋势反映在越来越多的关于寡基因疾病的临床研究中,疾病表现受少数特定基因上的变异组合影响。尽管已经开发了统计机器学习方法来识别与寡基因疾病相关的相关遗传变异或基因组合,但它们依赖于抽象特征和黑盒模型,这对医学专家的可解释性提出了挑战,并阻碍了他们理解和验证预测的能力。在这项工作中,我们提出了一种新颖的、基于知识图的可解释预测方法,该方法不仅提供了对致病基因相互作用的准确预测,还为这些结果提供了解释。
我们引入了 BOCK,这是一个用于探索致病遗传相互作用的知识图,它整合了来自临床病例的寡基因疾病的精心整理信息,以及相关的生物医学网络和本体。我们使用这个图,开发了一种基于连接基因对的异构路径的新预测框架。该方法训练了一个可解释的决策集模型,不仅可以准确预测致病基因相互作用,还可以揭示与这些疾病相关的模式。我们方法的一个独特方面是,它不仅能够提供阳性预测,还能够以子图的形式提供解释,揭示导致每个致病预测的特定实体和关系。
我们的方法考虑到了可解释性,利用知识图中的异构路径信息来预测致病基因相互作用,并生成有意义的解释。这不仅拓宽了我们对寡基因疾病的分子机制的理解,而且为知识图在为遗传研究创建更透明和有见地的预测器方面提供了新的应用。