Kocbek Simon, Kocbek Primož, Gosak Lucija, Fijačko Nino, Štiglic Gregor
Institute of Informatics, Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia.
Faculty of Health Sciences, University of Maribor, 2000 Maribor, Slovenia.
J Pers Med. 2022 Feb 28;12(3):368. doi: 10.3390/jpm12030368.
Type 2 diabetes mellitus (T2DM) often results in high morbidity and mortality. In addition, T2DM presents a substantial financial burden for individuals and their families, health systems, and societies. According to studies and reports, globally, the incidence and prevalence of T2DM are increasing rapidly. Several models have been built to predict T2DM onset in the future or detect undiagnosed T2DM in patients. Additional to the performance of such models, their interpretability is crucial for health experts, especially in personalized clinical prediction models. Data collected over 42 months from health check-up examinations and prescribed drugs data repositories of four primary healthcare providers were used in this study. We propose a framework consisting of LogicRegression based feature extraction and Least Absolute Shrinkage and Selection operator based prediction modeling for undiagnosed T2DM prediction. Performance of the models was measured using Area under the ROC curve (AUC) with corresponding confidence intervals. Results show that using LogicRegression based feature extraction resulted in simpler models, which are easier for healthcare experts to interpret, especially in cases with many binary features. Models developed using the proposed framework resulted in an AUC of 0.818 (95% Confidence Interval (CI): 0.812-0.823) that was comparable to more complex models (i.e., models with a larger number of features), where all features were included in prediction model development with the AUC of 0.816 (95% CI: 0.810-0.822). However, the difference in the number of used features was significant. This study proposes a framework for building interpretable models in healthcare that can contribute to higher trust in prediction models from healthcare experts.
2型糖尿病(T2DM)常常导致高发病率和高死亡率。此外,T2DM给个人及其家庭、卫生系统和社会带来了沉重的经济负担。根据研究和报告,在全球范围内,T2DM的发病率和患病率正在迅速上升。已经建立了几种模型来预测未来T2DM的发病情况或检测患者中未被诊断出的T2DM。除了这些模型的性能外,其可解释性对健康专家至关重要,尤其是在个性化临床预测模型中。本研究使用了从四个基层医疗服务提供者的健康检查和处方药数据存储库中收集的42个月的数据。我们提出了一个框架,该框架由基于逻辑回归的特征提取和基于最小绝对收缩和选择算子的预测建模组成,用于未被诊断出的T2DM预测。使用ROC曲线下面积(AUC)及其相应的置信区间来衡量模型的性能。结果表明,使用基于逻辑回归的特征提取会产生更简单的模型,这些模型更容易被医疗保健专家解释,特别是在具有许多二元特征的情况下。使用所提出的框架开发的模型的AUC为0.818(95%置信区间(CI):0.812 - 0.823),这与更复杂的模型(即具有更多特征的模型)相当,在这些复杂模型中,所有特征都被纳入预测模型开发,其AUC为0.816(95% CI:0.810 - 0.822)。然而,所使用特征数量的差异是显著的。本研究提出了一个在医疗保健领域构建可解释模型的框架,这有助于提高医疗保健专家对预测模型的信任度。