Department of Pure and Applied Sciences, University of Urbino, Piazza della Repubblica, 13, Urbino, 61029, Italy.
BMC Med Inform Decis Mak. 2024 Jun 28;24(Suppl 4):186. doi: 10.1186/s12911-024-02582-4.
Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML.
The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models.
The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios.
By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction.
临床医学为应用机器学习(ML)模型提供了一个很有前景的领域。然而,尽管有许多研究将 ML 应用于医学数据分析,但只有一小部分对临床护理产生了影响。本文强调了在医学数据分析中应用 ML 的重要性,认识到 ML 本身可能无法充分捕捉临床数据的全部复杂性,因此提倡将医学领域知识纳入 ML。
本研究对将医学知识纳入 ML 的先前努力进行了全面回顾,并将这些集成策略映射到 ML 管道的各个阶段,包括数据预处理、特征工程、模型训练和输出评估。该研究通过对糖尿病预测的案例研究进一步探讨了这种集成的重要性和影响。在这里,临床知识,包括规则、因果网络、区间和公式,在 ML 管道的各个阶段都得到了集成,从而产生了一系列集成模型。
研究结果突出了集成在准确性、可解释性、数据效率和遵守临床指南方面的优势。在几种情况下,集成模型的性能优于纯数据驱动方法,这表明领域知识可以通过提高泛化能力来增强 ML 模型。在其他情况下,集成在增强模型可解释性和确保与既定临床指南一致方面发挥了重要作用。值得注意的是,知识集成在数据有限的情况下也能有效地保持性能。
通过临床案例研究展示了各种集成策略,为未来的集成工作提供了启示和指导。此外,研究还确定了需要改进领域知识表示,并调整其对 ML 模型的贡献,这是集成的两个主要挑战,并旨在激发这方面的进一步研究。