Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan.
Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City, Taiwan.
BMC Med Res Methodol. 2024 Nov 22;24(1):288. doi: 10.1186/s12874-024-02411-2.
Dementia is a significant medical and social issue in most developed countries. Practical tools for predicting the progression of degenerative dementia are highly valuable. Machine learning (ML) methods facilitate the construction of effective models using real-world data, which may include missing values and various integrated datasets.
This retrospective study analyzed data from 679 patients diagnosed with degenerative dementia at Fu Jen Catholic University Hospital, who were evaluated by neurologists, psychologists and followed for over two years. Predictive variables were categorized into demographic (D), clinical dementia rating (CDR), mini-mental state examination (MMSE), and laboratory data value (LV) groups. These categories were further integrated into three subgroups (D-CDR, D-CDR-MMSE, and D-CDR-MMSE-LV). We utilized the extreme gradient boosting (XGB) model to rank the importance of variables and identify the most effective feature combination via a step-wise approach.
The D-CDR-MMSE-LV model combination showed robust performance with an excellent area under the receiver operating characteristic curve (AUC) and the highest sensitivity value (84.66). Employing both demographic and neuropsychiatric variables, our prediction model achieved an AUC of 83.74. By incorporating additional clinical information from laboratory data and applying our proposed feature selection strategy, we constructed a model based on eight variables that achieved an AUC of 85.12 using the XGB technique.
We established a machine-learning model to monitor the progression of dementia using a limited, real-world clinical dataset. The XGB technique identified eight critical variables across our integrated datasets, potentially providing clinicians with valuable guidance.
痴呆症是大多数发达国家的一个重大医学和社会问题。用于预测退行性痴呆进展的实用工具具有很高的价值。机器学习(ML)方法通过使用真实世界的数据来构建有效的模型,这些数据可能包括缺失值和各种集成数据集。
这项回顾性研究分析了来自辅仁大学天主教医院的 679 名退行性痴呆症患者的数据,这些患者由神经科医生、心理学家进行评估,并随访了两年以上。预测变量分为人口统计学(D)、临床痴呆评定量表(CDR)、简易精神状态检查(MMSE)和实验室数据值(LV)组。这些类别进一步整合为三个亚组(D-CDR、D-CDR-MMSE 和 D-CDR-MMSE-LV)。我们利用极端梯度提升(XGB)模型对变量的重要性进行排名,并通过逐步方法确定最有效的特征组合。
D-CDR-MMSE-LV 模型组合表现出良好的性能,具有出色的接收器操作特征曲线(AUC)和最高的敏感性值(84.66)。我们的预测模型同时使用人口统计学和神经精神病学变量,AUC 为 83.74。通过结合来自实验室数据的额外临床信息并应用我们提出的特征选择策略,我们使用 XGB 技术构建了一个基于八个变量的模型,AUC 为 85.12。
我们使用有限的真实世界临床数据集建立了一个机器学习模型来监测痴呆症的进展。XGB 技术在我们的集成数据集中识别出了八个关键变量,这可能为临床医生提供了有价值的指导。