Xie Juan, Ma Run-Wei, Feng Yu-Jing, Qiao Yuan, Zhu Hong-Yan, Tao Xing-Ping, Chen Wen-Juan, Liu Cong-Yun, Li Tan, Liu Kai, Cheng Li-Ming
Department of Anesthesiology, Kunming Children'S Hospital, Kunming City, Yunnan Province, China.
Department of Cardiac Surgery, Fuwai Yunnan Hospital, Chinese Academy of Medical Sciences/Affiliated Cardiovascular Hospital of Kunming Medical University, Kunming City, Yunnan Province, China.
BMC Infect Dis. 2025 Mar 27;25(1):428. doi: 10.1186/s12879-025-10797-7.
Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods.
The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management.
First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time.
The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making.
The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.
百日咳是一种高度传染性的呼吸道疾病。尽管疫苗接种已降低了发病率,但由于免疫逃逸和疫苗效力减弱,某些地区仍有病例再次出现。识别高危患者以减轻传播并及时避免并发症至关重要。然而,当前的诊断方法,包括聚合酶链反应(PCR)和细菌培养,既耗时又昂贵。一些研究试图基于多变量数据开发风险预测模型,但其性能仍可提高。因此,本研究旨在进一步优化和扩展风险评估工具,以更有效地识别高危个体,并弥补现有诊断方法的不足。
本研究的目的是开发一种高效且具有良好泛化能力的百日咳风险预测模型,适用于不同的数据集。该模型基于多中心数据,使用机器学习技术构建,并筛选关键特征。通过在在线平台上部署该模型来评估其性能和泛化能力。同时,本研究旨在为临床实践提供一种快速准确的辅助诊断工具,以帮助及时识别高危患者,优化早期干预策略,降低并发症风险并减少传播,从而提高公共卫生管理效率。
首先,收集了来自7个中心的1085例疑似百日咳患者的数据,并使用套索回归和Boruta算法分析了10个关键特征:血小板分布宽度与平均血小板体积比值(PDW-MPV-RATIO)、全身炎症反应指数(SII)、白细胞、血小板分布宽度、平均血小板体积、淋巴细胞、咳嗽持续时间、疫苗接种情况、发热以及溶解性淋巴细胞。然后基于这些特征训练并验证了8个模型,以评估它们的性能,并使用外部数据集确认其泛化能力。最后,构建了一个在线平台供临床医生实时使用这些模型。
随机森林模型在验证集中表现出出色的区分能力,曲线下面积(AUC)为0.98,在外部验证集中AUC为图0.97。校准曲线和决策曲线分析表明,该模型在预测低至中度风险患者方面具有很高的准确性,这有助于临床医生避免不必要的干预,尤其是在资源有限的环境中。该模型的应用有助于优化高危患者的早期识别和管理,并改善临床决策。
本研究设计的百日咳预测模型使用多中心数据进行了验证,表现出较高的预测性能,并成功在网上实施。未来的研究应拓宽数据源并纳入动态数据,以提高模型的准确性和适用性。