Ning Wanshan, Wang Zhicheng, Gu Ying, Huang Lindan, Liu Shuai, Chen Qun, Yang Yunyun, Hong Guolin
Department of Laboratory Medicine, Xiamen Key Laboratory of Genetic Testing, School of Medicine, the First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China.
Institute for Clinical Medical Research, School of Medicine, the First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China.
Sci Rep. 2025 Jul 30;15(1):27857. doi: 10.1038/s41598-025-09439-4.
Globally, nervous system diseases are the leading cause of disability-adjusted life-years and the second leading cause of mortality in the world. Traditional diagnostic methods for nervous system diseases are expensive. So this study aimed to construct machine learning models using the convenient blood routine and biochemical detection data for diagnosis of nervous system diseases. After the data preprocessing, 25,794 healthy people and 7518 nervous system disease patients with the blood routine and biochemical detection data were utilized for our study. We selected logistic regression, random forest, support vector machine, eXtreme Gradient Boosting (XGBoost), and deep neural network to construct models. Finally, the SHAP algorithm was used to interpret models. The nervous system disease prediction model constructed by XGBoost possessed the best performance (AUC: 0.9782). And the most models of distinguishing various nervous system diseases also had good performance, the model performance of distinguishing neuromyelitis optica from other nervous system diseases was the best (AUC: 0.9095). The model interpretation by SHAP algorithm indicated features from biochemical detection made major contributions to predicting nervous system disease. The present study constructed multiple models using 52 features from the blood routine and biochemical detection data for diagnosis of various nervous system diseases. Meanwhile, distinct hematologic features of various nervous system diseases also were explored. This cost-effective work will benefit more people and assist in diagnosis and prevention of nervous system diseases.
在全球范围内,神经系统疾病是导致伤残调整生命年的主要原因,也是全球第二大死亡原因。传统的神经系统疾病诊断方法成本高昂。因此,本研究旨在利用便捷的血常规和生化检测数据构建机器学习模型,用于诊断神经系统疾病。经过数据预处理后,本研究使用了25794名健康人和7518名患有神经系统疾病且有血常规和生化检测数据的患者。我们选择逻辑回归、随机森林、支持向量机、极端梯度提升(XGBoost)和深度神经网络来构建模型。最后,使用SHAP算法对模型进行解释。由XGBoost构建的神经系统疾病预测模型表现最佳(AUC:0.9782)。并且区分各种神经系统疾病的大多数模型也具有良好的性能,区分视神经脊髓炎与其他神经系统疾病的模型性能最佳(AUC:0.9095)。通过SHAP算法进行的模型解释表明,生化检测的特征对预测神经系统疾病有主要贡献。本研究利用血常规和生化检测数据中的52个特征构建了多个模型,用于诊断各种神经系统疾病。同时,还探索了各种神经系统疾病独特的血液学特征。这项具有成本效益的工作将使更多人受益,并有助于神经系统疾病的诊断和预防。