机器学习方法在预测进行血液检测的患者真实禁食状态中的应用。

Application of machine learning methods for the prediction of true fasting status in patients performing blood tests.

机构信息

Big Data Center, China Medical University Hospital and College of Medicine, China Medical University, 2, Yude Rd., North Dist., Taichung, 404, Taiwan.

The PhD Program for Cancer Biology and Drug Discovery, College of Medicine, China Medical University, Taichung, Taiwan.

出版信息

Sci Rep. 2022 Jul 13;12(1):11929. doi: 10.1038/s41598-022-15161-2.

DOI:10.1038/s41598-022-15161-2

PMID:35831336

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9279373/

Abstract

The fasting blood glucose (FBG) values extracted from electronic medical records (EMR) are assumed valid in existing research, which may cause diagnostic bias due to misclassification of fasting status. We proposed a machine learning (ML) algorithm to predict the fasting status of blood samples. This cross-sectional study was conducted using the EMR of a medical center from 2003 to 2018 and a total of 2,196,833 ontological FBGs from the outpatient service were enrolled. The theoretical true fasting status are identified by comparing the values of ontological FBG with average glucose levels derived from concomitant tested HbA1c based on multi-criteria. In addition to multiple logistic regression, we extracted 67 features to predict the fasting status by eXtreme Gradient Boosting (XGBoost). The discrimination and calibration of the prediction models were also assessed. Real-world performance was gauged by the prevalence of ineffective glucose measurement (IGM). Of the 784,340 ontologically labeled fasting samples, 77.1% were considered theoretical FBGs. The median (IQR) glucose and HbA1c level of ontological and theoretical fasting samples in patients without diabetes mellitus (DM) were 94.0 (87.0, 102.0) mg/dL and 5.6 (5.4, 5.9)%, and 92.0 (86.0, 99.0) mg/dL and 5.6 (5.4, 5.9)%, respectively. The XGBoost showed comparable calibration and AUROC of 0.887 than that of 0.868 in multiple logistic regression in the parsimonious approach and identified important predictors of glucose level, home-to-hospital distance, age, and concomitantly serum creatinine and lipid testing. The prevalence of IGM dropped from 27.8% based on ontological FBGs to 0.48% by using algorithm-verified FBGs. The proposed ML algorithm or multiple logistic regression model aids in verification of the fasting status.

摘要

从电子病历（EMR）中提取的空腹血糖（FBG）值在现有研究中被认为是有效的，但由于空腹状态的错误分类，可能会导致诊断偏差。我们提出了一种机器学习（ML）算法来预测血样的空腹状态。这项横断面研究使用了一家医疗中心的 EMR，从 2003 年到 2018 年共纳入了 2196833 例门诊服务的本体 FBG。通过比较本体 FBG 值与基于多标准从同时测试的 HbA1c 得出的平均葡萄糖水平，确定理论上的真实空腹状态。除了多元逻辑回归，我们还通过极端梯度提升（XGBoost）提取了 67 个特征来预测空腹状态。还评估了预测模型的区分度和校准度。通过无效血糖测量（IGM）的患病率来衡量实际表现。在 784340 个本体论标记的空腹样本中，77.1%被认为是理论 FBG。无糖尿病（DM）患者的本体和理论空腹样本的中位数（IQR）血糖和 HbA1c 水平分别为 94.0（87.0，102.0）mg/dL 和 5.6（5.4，5.9）%，92.0（86.0，99.0）mg/dL 和 5.6（5.4，5.9）%。在简约方法中，XGBoost 显示出与多元逻辑回归相当的校准度和 AUC 为 0.887，而后者为 0.868，并确定了血糖水平、家到医院的距离、年龄以及同时的血清肌酐和脂质检测的重要预测因素。通过使用算法验证的 FBG，基于本体 FBG 的 IGM 患病率从 27.8%降至 0.48%。提出的 ML 算法或多元逻辑回归模型有助于验证空腹状态。