Institute for Medical Informatics Statistics and Documentation, Medical University Graz, Graz, Austria.
MI2DataLab, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
Sci Rep. 2022 Oct 7;12(1):16857. doi: 10.1038/s41598-022-21417-8.
Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research areas. In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Systems biology is a good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer.
机器学习方法可以检测变量之间的复杂关系,但通常不会利用领域知识。这是一个局限性,因为在许多科学学科中,如系统生物学,领域知识以图或网络的形式存在,并且其使用可以提高模型性能。我们需要具有多功能性和适用于许多研究领域的基于网络的算法。在这项工作中,我们使用具有内在可解释性的新型贪婪决策森林 (GDF) 基于多模态节点特征来演示子网络检测。后者将是保留专家并赢得他们对这些算法信任的关键因素。为了展示一个具体的应用示例,我们专注于生物信息学、系统生物学,特别是生物医学,但所提出的方法也适用于许多其他领域。系统生物学是一个很好的例子,在这个领域中,统计数据驱动的机器学习使我们能够分析大量多模态的生物医学数据。这对于实现精准医学的未来目标很重要,在该目标中,将患者的复杂性建模到系统级别,以便根据个体患者的情况来最好地定制医疗决策、健康实践和治疗。我们提出的可解释方法可以帮助从多组学数据中发现导致疾病的网络模块,以更好地理解癌症等复杂疾病。