IEEE J Biomed Health Inform. 2020 Jan;24(1):235-246. doi: 10.1109/JBHI.2019.2899218. Epub 2019 Feb 13.
The diagnosis of type 2 diabetes (T2D) at an early stage has a key role for an adequate T2D integrated management system and patient's follow-up. Recent years have witnessed an increasing amount of available electronic health record (EHR) data and machine learning (ML) techniques have been considerably evolving. However, managing and modeling this amount of information may lead to several challenges, such as overfitting, model interpretability, and computational cost. Starting from these motivations, we introduced an ML method called sparse balanced support vector machine (SB-SVM) for discovering T2D in a novel collected EHR dataset (named Federazione Italiana Medici di Medicina Generale dataset). In particular, among all the EHR features related to exemptions, examination, and drug prescriptions, we have selected only those collected before T2D diagnosis from an uniform age group of subjects. We demonstrated the reliability of the introduced approach with respect to other ML and deep learning approaches widely employed in the state-of-the-art for solving this task. Results evidence that the SB-SVM overcomes the other state-of-the-art competitors providing the best compromise between predictive performance and computation time. Additionally, the induced sparsity allows to increase the model interpretability, while implicitly managing high-dimensional data and the usual unbalanced class distribution.
在早期诊断 2 型糖尿病 (T2D) 对于完善的 T2D 综合管理系统和患者随访至关重要。近年来,电子健康记录 (EHR) 数据不断增加,机器学习 (ML) 技术也得到了极大的发展。然而,管理和建模如此大量的信息可能会带来一些挑战,例如过拟合、模型可解释性和计算成本。基于这些动机,我们引入了一种名为稀疏平衡支持向量机 (SB-SVM) 的 ML 方法,用于在一个新收集的 EHR 数据集 (称为意大利全科医生联合会数据集) 中发现 T2D。具体来说,在与豁免、检查和药物处方相关的所有 EHR 特征中,我们只选择了在 T2D 诊断前从一个统一年龄组的受试者中收集的特征。我们证明了所提出的方法相对于其他广泛应用于该任务的 ML 和深度学习方法的可靠性。结果表明,SB-SVM 克服了其他最先进的竞争对手,在预测性能和计算时间之间提供了最佳的折衷。此外,诱导稀疏性允许提高模型的可解释性,同时隐式管理高维数据和常见的不平衡类分布。