Group Health Research Institute, Seattle, WA 98101, USA.
Pharmacoepidemiol Drug Saf. 2013 Aug;22(8):834-41. doi: 10.1002/pds.3418. Epub 2013 Apr 1.
This study aimed to develop Natural Language Processing (NLP) approaches to supplement manual outcome validation, specifically to validate pneumonia cases from chest radiograph reports.
We trained one NLP system, ONYX, using radiograph reports from children and adults that were previously manually reviewed. We then assessed its validity on a test set of 5000 reports. We aimed to substantially decrease manual review, not replace it entirely, and so, we classified reports as follows: (1) consistent with pneumonia; (2) inconsistent with pneumonia; or (3) requiring manual review because of complex features. We developed processes tailored either to optimize accuracy or to minimize manual review. Using logistic regression, we jointly modeled sensitivity and specificity of ONYX in relation to patient age, comorbidity, and care setting. We estimated positive and negative predictive value (PPV and NPV) assuming pneumonia prevalence in the source data.
Tailored for accuracy, ONYX identified 25% of reports as requiring manual review (34% of true pneumonias and 18% of non-pneumonias). For the remainder, ONYX's sensitivity was 92% (95% CI 90-93%), specificity 87% (86-88%), PPV 74% (72-76%), and NPV 96% (96-97%). Tailored to minimize manual review, ONYX classified 12% as needing manual review. For the remainder, ONYX had sensitivity 75% (72-77%), specificity 95% (94-96%), PPV 86% (83-88%), and NPV 91% (90-91%).
For pneumonia validation, ONYX can replace almost 90% of manual review while maintaining low to moderate misclassification rates. It can be tailored for different outcomes and study needs and thus warrants exploration in other settings.
本研究旨在开发自然语言处理(NLP)方法来补充手动结果验证,特别是验证胸部 X 光报告中的肺炎病例。
我们使用先前经过手动审查的儿童和成人的 X 光报告来训练一个名为 ONYX 的 NLP 系统。然后,我们在一个包含 5000 份报告的测试集中评估了它的有效性。我们的目标是大量减少手动审查,而不是完全取代它,因此,我们将报告分类如下:(1)与肺炎一致;(2)与肺炎不一致;或(3)由于复杂特征需要手动审查。我们开发了针对准确性或最小化手动审查的过程。我们使用逻辑回归联合建模了 ONYX 与患者年龄、合并症和护理环境的敏感性和特异性。我们根据源数据中的肺炎患病率估计了阳性和阴性预测值(PPV 和 NPV)。
针对准确性进行调整的 ONYX 将 25%的报告标记为需要手动审查(34%的真肺炎和 18%的非肺炎)。对于其余的报告,ONYX 的敏感性为 92%(95%CI 90-93%),特异性为 87%(86-88%),PPV 为 74%(72-76%),NPV 为 96%(96-97%)。针对最小化手动审查进行调整的 ONYX 将 12%的报告标记为需要手动审查。对于其余的报告,ONYX 的敏感性为 75%(72-77%),特异性为 95%(94-96%),PPV 为 86%(83-88%),NPV 为 91%(90-91%)。
对于肺炎验证,ONYX 可以替代近 90%的手动审查,同时保持较低至中等的分类错误率。它可以根据不同的结果和研究需求进行调整,因此值得在其他环境中进行探索。