Ozawa Takuya, Chubachi Shotaro, Namkoong Ho, Nemoto Shota, Ikegami Ryo, Asakura Takanori, Tanaka Hiromu, Lee Ho, Fukushima Takahiro, Azekawa Shuhei, Otake Shiro, Nakagawara Kensuke, Watase Mayuko, Masaki Katsunori, Kamata Hirofumi, Harada Norihiro, Ueda Tetsuya, Ueda Soichiro, Ishiguro Takashi, Arimura Ken, Saito Fukuki, Yoshiyama Takashi, Nakano Yasushi, Muto Yoshikazu, Suzuki Yusuke, Edahiro Ryuya, Murakami Koji, Sato Yasunori, Okada Yukinori, Koike Ryuji, Ishii Makoto, Hasegawa Naoki, Kitagawa Yuko, Tokunaga Katsushi, Kimura Akinori, Miyano Satoru, Ogawa Seishi, Kanai Takanori, Fukunaga Koichi, Imoto Seiya
Division of Pulmonary Medicine, Department of Internal Medicine, Keio University School of Medicine, Tokyo, Japan.
Department of Infectious Diseases, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.
Sci Rep. 2025 Mar 19;15(1):9459. doi: 10.1038/s41598-025-85733-5.
Predictive models for determining coronavirus disease 2019 (COVID-19) severity have been established; however, the complexity of the interactions among factors limits the use of conventional statistical methods. This study aimed to establish a simple and accurate predictive model for COVID-19 severity using an explainable machine learning approach. A total of 3,301 patients ≥ 18 years diagnosed with COVID-19 between February 2020 and October 2022 were included. The discovery cohort comprised patients whose disease onset fell before October 1, 2020 (N = 1,023), and the validation cohort comprised the remaining patients (N = 2,278). Pointwise linear and logistic regression models were used to extract 41 features. Reinforcement learning was used to generate a simple model with high predictive accuracy. The primary evaluation was the area under the receiver operating characteristic curve (AUC). The predictive model achieved an AUC of ≥ 0.905 using four features: serum albumin levels, lactate dehydrogenase levels, age, and neutrophil count. The highest AUC value was 0.906 (sensitivity, 0.842; specificity, 0.811) in the discovery cohort and 0.861 (sensitivity, 0.804; specificity, 0.675) in the validation cohort. Simple and well-structured predictive models were established, which may aid in patient management and the selection of therapeutic interventions.
用于确定2019冠状病毒病(COVID-19)严重程度的预测模型已经建立;然而,因素之间相互作用的复杂性限制了传统统计方法的应用。本研究旨在使用可解释的机器学习方法建立一个简单而准确的COVID-19严重程度预测模型。纳入了2020年2月至2022年10月期间诊断为COVID-19的3301例≥18岁患者。发现队列包括发病时间在2020年10月1日之前的患者(N = 1023),验证队列包括其余患者(N = 2278)。使用逐点线性和逻辑回归模型提取41个特征。采用强化学习生成具有高预测准确性的简单模型。主要评估指标是受试者工作特征曲线下面积(AUC)。该预测模型使用血清白蛋白水平、乳酸脱氢酶水平、年龄和中性粒细胞计数这四个特征实现了≥0.905的AUC。在发现队列中,最高AUC值为0.906(敏感性,0.842;特异性,0.811),在验证队列中为0.861(敏感性,0.804;特异性,0.675)。建立了简单且结构良好的预测模型,这可能有助于患者管理和治疗干预的选择。