Wang Ruoning, Luo Wenjing, Liu Zifeng, Liu Weilong, Liu Chunxin, Liu Xun, Zhu He, Li Rui, Song Jiafang, Hu Xueqiang, Han Sheng, Qiu Wei
Department of Continuing Medical Education, Peking University Health Science Center, Beijing, China.
Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China.
Mult Scler Relat Disord. 2021 Jan;47:102632. doi: 10.1016/j.msard.2020.102632. Epub 2020 Nov 18.
Delayed multiple sclerosis (MS) diagnoses are not uncommon, an early diagnostic tool is urgently warranted. We aimed to develop an effective tool through electronic health records and machine learning techniques to early recognize MS patients from hospital visitors in China.
Two case sets were collected from January 2016 to December 2018. The training set had 239 MS and 1142 controls, and the test set had 23 MS and 92 controls. The utility of Extreme Gradient Boosting (XGBoost), Random Forest (RF), Naive Bayes, K-nearest-neighbor (KNN) and Support Vector Machine (SVM) in early diagnosis of MS was evaluated by the area under curve of receiver operating characteristic, precision, recall, specificity, accuracy and F1 score.
The XGBoost performed the best and was used to generate the results. Thirty-four variables which were highly relevant to MS diagnosis were set for the XGBoost model, and their relative importance with MS were ranked. The training set recall was 0.632, with a precision of 0.576, and the test set recall was 0.609, with a precision of 0.609. Our study found that 61%, 51%, and 49% of the patients could be diagnosed with MS, 1, 2, and 3 years earlier than their real diagnostic time point, respectively.
A diagnostic tool for early MS recognition based on the XGBoost model and electronic health records were developed to help reduce diagnostic delays in MS.
多发性硬化症(MS)延迟诊断并不罕见,迫切需要一种早期诊断工具。我们旨在通过电子健康记录和机器学习技术开发一种有效工具,以便在中国从医院就诊者中早期识别MS患者。
收集了2016年1月至2018年12月的两组病例。训练集有239例MS患者和1142例对照,测试集有23例MS患者和92例对照。通过受试者操作特征曲线下面积、精度、召回率、特异性、准确性和F1分数评估极端梯度提升(XGBoost)、随机森林(RF)、朴素贝叶斯、K近邻(KNN)和支持向量机(SVM)在MS早期诊断中的效用。
XGBoost表现最佳并用于生成结果。为XGBoost模型设置了34个与MS诊断高度相关的变量,并对它们与MS的相对重要性进行了排名。训练集召回率为0.632,精度为0.576,测试集召回率为0.609,精度为0.609。我们的研究发现,分别有61%、51%和49%的患者可以在比实际诊断时间点早1年、2年和3年时被诊断为MS。
开发了一种基于XGBoost模型和电子健康记录的MS早期识别诊断工具,以帮助减少MS的诊断延迟。