Chew Rusheng, Woods Marion L, Paterson David L
Mathematical and Economic Modelling Department, Mahidol Oxford Tropical Medicine Research Unit, c/o Faculty of Tropical Medicine, Mahidol University, 3rd floor, 60th Anniversary Chalermprakiat Building, 420/6 Ratchawithi Road, Ratchathewi, Bangkok 10400, Thailand.
Centre for Tropical Medicine and Global Health, University of Oxford, Oxford OX3 7LG, UK.
Int Health. 2025 Sep 3;17(5):804-808. doi: 10.1093/inthealth/ihae052.
The global burden of the opportunistic fungal disease Pneumocystis jirovecii pneumonia (PJP) remains substantial. Polymerase chain reaction (PCR) on nasopharyngeal swabs (NPS) has high specificity and may be a viable alternative to the gold standard diagnostic of PCR on invasively collected lower respiratory tract specimens, but has low sensitivity. Sensitivity may be improved by incorporating NPS PCR results into machine learning models.
Three supervised multivariable diagnostic models (random forest, logistic regression and extreme gradient boosting) were constructed and validated using a 111-person Australian dataset. The predictors were age, gender, immunosuppression type and NPS PCR result. Model performance metrics such as accuracy, sensitivity, specificity and predictive values were compared to select the best-performing model.
The logistic regression model performed best, with 80% accuracy, improving sensitivity to 86% and maintaining acceptable specificity of 70%. Using this model, positive and negative NPS PCR results indicated post-test probabilities of 84% (likely PJP) and 26% (unlikely PJP), respectively.
The logistic regression model should be externally validated in a wider range of settings. As the predictors are simple, routinely collected patient variables, this model may represent a diagnostic advance suitable for settings where collection of lower respiratory tract specimens is difficult but PCR is available.
机会性真菌疾病耶氏肺孢子菌肺炎(PJP)的全球负担依然沉重。对鼻咽拭子(NPS)进行聚合酶链反应(PCR)具有高特异性,可能是侵入性采集的下呼吸道标本PCR金标准诊断的可行替代方法,但灵敏度较低。将NPS PCR结果纳入机器学习模型可能会提高灵敏度。
使用一个111人的澳大利亚数据集构建并验证了三种监督多变量诊断模型(随机森林、逻辑回归和极端梯度提升)。预测因素包括年龄、性别、免疫抑制类型和NPS PCR结果。比较模型性能指标,如准确性、灵敏度、特异性和预测值,以选择性能最佳的模型。
逻辑回归模型表现最佳,准确率为80%,灵敏度提高到86%,特异性保持在可接受的70%。使用该模型,NPS PCR结果为阳性和阴性时,检测后概率分别为84%(可能为PJP)和26%(不太可能为PJP)。
逻辑回归模型应在更广泛的环境中进行外部验证。由于预测因素是简单的、常规收集的患者变量,该模型可能代表了一种诊断进展,适用于难以采集下呼吸道标本但可进行PCR检测的环境。