Suppr超能文献

基于实验室检查的机器学习模型在疾病诊断预测中的开发。

Development of machine learning model for diagnostic disease prediction based on laboratory tests.

机构信息

Department of Laboratory Medicine, College of Medicine, Ewha Womans University of Korea, Seoul, South Korea.

Department of Laboratory Medicine, St. Vincent's Hospital, The Catholic University of Korea, Seoul, South Korea.

出版信息

Sci Rep. 2021 Apr 7;11(1):7567. doi: 10.1038/s41598-021-87171-5.

Abstract

The use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.

摘要

深度学习和机器学习(ML)在医学中的应用正在增加,特别是在视觉、音频和语言数据领域。我们旨在通过融合深度神经网络(DNN)模型和两个用于使用实验室测试结果进行疾病预测的 ML 模型来构建一个新的优化集成模型。从基于数值计数、临床重要性相关特征和缺失值的数据集选择了 86 个属性(实验室测试)。我们收集了 5145 例样本数据集,包括 326686 个实验室测试结果。我们根据国际疾病分类第 10 版(ICD-10)代码总共调查了 39 种特定疾病。这些数据集用于构建轻梯度提升机(LightGBM)和极端梯度提升(XGBoost)ML 模型以及使用 TensorFlow 的 DNN 模型。优化集成模型对五种最常见疾病的 F1 得分为 81%,预测准确率为 92%。深度学习和 ML 模型在预测能力和疾病分类模式方面表现出差异。我们使用混淆矩阵并使用 SHAP 值方法分析特征重要性。我们的新 ML 模型通过疾病分类实现了疾病预测的高效率。这项研究将有助于疾病的预测和诊断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b597/8026627/74d8fc2ee186/41598_2021_87171_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验