Cumming School of Medicine, University of Calgary.
Centre for Health Services and Policy Research, University of British Columbia.
Int J Popul Data Sci. 2021 Sep 10;6(1):1650. doi: 10.23889/ijpds.v6i1.1650. eCollection 2021.
Frailty is a medical syndrome, commonly affecting people aged 65 years and over and is characterized by a greater risk of adverse outcomes following illness or injury. Electronic medical records contain a large amount of longitudinal data that can be used for primary care research. Machine learning can fully utilize this wide breadth of data for the detection of diseases and syndromes. The creation of a frailty case definition using machine learning may facilitate early intervention, inform advanced screening tests, and allow for surveillance.
The objective of this study was to develop a validated case definition of frailty for the primary care context, using machine learning.
Physicians participating in the Canadian Primary Care Sentinel Surveillance Network across Canada were asked to retrospectively identify the level of frailty present in a sample of their own patients (total n 5,466), collected from 2015-2019. Frailty levels were dichotomized using a cut-off of 5. Extracted features included previously prescribed medications, billing codes, and other routinely collected primary care data. We used eight supervised machine learning algorithms, with performance assessed using a hold-out test set. A balanced training dataset was also created by oversampling. Sensitivity analyses considered two alternative dichotomization cut-offs. Model performance was evaluated using area under the receiver-operating characteristic curve, F1, accuracy, sensitivity, specificity, negative predictive value and positive predictive value.
The prevalence of frailty within our sample was 18.4%. Of the eight models developed to identify frail patients, an XGBoost model achieved the highest sensitivity (78.14%) and specificity (74.41%). The balanced training dataset did not improve classification performance. Sensitivity analyses did not show improved performance for cut-offs other than 5.
Supervised machine learning was able to create well performing classification models for frailty. Future research is needed to assess frailty inter-rater reliability, and link multiple data sources for frailty identification.
衰弱是一种医学综合征,常见于 65 岁及以上人群,其特征是在患病或受伤后发生不良后果的风险增加。电子病历包含大量的纵向数据,可用于初级保健研究。机器学习可以充分利用这些广泛的数据来检测疾病和综合征。使用机器学习创建衰弱病例定义可能有助于早期干预、告知先进的筛选测试并进行监测。
本研究旨在使用机器学习为初级保健环境开发衰弱的验证病例定义。
加拿大初级保健监测网络的医生被要求回顾性地识别他们自己的患者样本(共 5466 人)中存在的衰弱程度,这些患者样本是在 2015 年至 2019 年期间收集的。使用 5 作为截断值将衰弱程度分为二分类。提取的特征包括先前开的药物、计费代码和其他常规收集的初级保健数据。我们使用了八种有监督的机器学习算法,并使用保留测试集评估性能。还通过过采样创建了平衡的训练数据集。敏感性分析考虑了两种替代的二分类截断值。使用接收者操作特征曲线下面积、F1、准确性、敏感性、特异性、阴性预测值和阳性预测值评估模型性能。
在我们的样本中,衰弱的患病率为 18.4%。在开发的用于识别衰弱患者的八种模型中,XGBoost 模型的敏感性(78.14%)和特异性(74.41%)最高。平衡的训练数据集并未提高分类性能。敏感性分析表明,5 以外的其他截断值没有提高性能。
监督机器学习能够为衰弱创建性能良好的分类模型。未来的研究需要评估衰弱的评估者间可靠性,并链接多个数据源以进行衰弱识别。