Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts; Mass General Brigham AI, Boston, Massachusetts.
J Am Coll Radiol. 2024 Oct;21(10):1575-1582. doi: 10.1016/j.jacr.2024.06.014. Epub 2024 Jul 1.
PURPOSE: We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images. METHODS: We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error-true-positive) or absent (NLP error-false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors. RESULTS: Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%). CONCLUSION: The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.
目的:我们比较了生成式人工智能(AI)(增强型 Transformer 辅助放射学智能[ATARI,微软 Nuance,微软公司,雷德蒙德,华盛顿])和自然语言处理(NLP)工具在识别放射学报告和图像中的侧位错误方面的性能。
方法:我们使用基于 NLP 的(mPower,微软 Nuance)工具来识别其质量保证仪表板中标记为侧位错误的放射学报告。NLP 模型可检测并突出放射学报告中的侧位不匹配。从 NLP 标记为侧位错误的最初 1124 份放射学报告中,我们选择并评估了 898 份涵盖放射摄影、CT、MRI 和超声方式的报告,以确保全面覆盖。放射科医生会查看每份放射学报告,以评估标记的侧位错误是否存在(报告错误-真阳性)或不存在(NLP 错误-假阳性)。接下来,我们将 ATARI 应用于 237 份带有连续 NLP 真阳性(118 份报告)和假阳性(119 份报告)侧位错误的放射学报告和图像。我们估计了 NLP 和生成式 AI 工具识别整体和模态侧位错误的准确性。
结果:在 898 份 NLP 标记的侧位错误中,64%(898 份中有 574 份)为 NLP 错误,36%(898 份中有 324 份)为报告错误。文本查询 ATARI 功能准确地识别出侧位不匹配的缺失(NLP 假阳性),准确率为 97.4%(118 份报告中的 115 份;95%置信区间[CI]为 96.5%-98.3%)。结合视觉和文本查询的准确率为 98.3%(118 份报告或图像中的 116 份;95%CI 为 97.6%-99.0%),而仅查询文本的准确率为 98.3%(118 份图像中的 116 份;95%CI 为 97.6%-99.0%)。
结论:生成式 AI 赋能的 ATARI 原型在确定放射学报告中的真实和虚假侧位错误方面优于评估的 NLP 工具,同时还实现了基于图像的侧位确定。在复杂的放射学报告中,ATARI 文本查询中的基础错误强调了需要进一步改进该技术。
Radiographics. 2021
Comput Methods Programs Biomed. 2024-10
Diagnostics (Basel). 2025-1-24