Suppr超能文献

啰音不可靠:一项使用医生和人工智能进行呼吸音研究的新见解。

The unreliability of crackles: insights from a breath sound study using physicians and artificial intelligence.

机构信息

Department of Emergency Medicine, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu City, Taiwan, R.O.C.

College of Semiconductor Research, National Tsing Hua University, Hsinchu City, Taiwan, R.O.C.

出版信息

NPJ Prim Care Respir Med. 2024 Oct 15;34(1):28. doi: 10.1038/s41533-024-00392-9.

Abstract

BACKGROUND AND INTRODUCTION

In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge.

OBJECTIVES

This study aims to evaluate the difference in the identification ability of different breath sound.

METHODS/DESCRIPTION: In this prospective study, breath sounds from the Formosa Archive of Breath Sound were labeled by five physicians. Six artificial intelligence (AI) breath sound interpretation models were developed based on all labeled data and the labels from the five physicians, respectively. After labeling by AIs and physicians, labels with discrepancy were considered doubtful and relabeled by two additional physicians. The final labels were determined by a majority vote among the physicians. The capability of breath sound identification for humans and AI was evaluated using sensitivity, specificity and the area under the receiver-operating characteristic curve (AUROC).

RESULTS/OUTCOME: A total of 11,532 breath sound files were labeled, with 579 doubtful labels identified. After relabeling and exclusion, there were 305 labels with gold standard. For wheezing, both human physicians and the AI model demonstrated good sensitivities (89.5% vs. 86.0%) and good specificities (96.4% vs. 95.2%). For crackles, both human physicians and the AI model showed good sensitivities (93.9% vs. 80.3%) but poor specificities (56.6% vs. 65.9%). Lower AUROC values were noted in crackles identification for both physicians and the AI model compared to wheezing.

CONCLUSION

Even with the assistance of artificial intelligence tools, accurately identifying crackles compared to wheezing remains challenging. Consequently, crackles are unreliable for medical decision-making, and further examination is warranted.

摘要

背景与介绍

相较于其他物理评估方法,呼吸评估的不一致性仍然是一个主要的问题和挑战。

目的

本研究旨在评估不同呼吸音识别能力的差异。

方法/描述:在这项前瞻性研究中,Formosa Archive of Breath Sound 的呼吸音由五名医生进行标注。基于所有标注数据和五名医生的标注,分别开发了六个人工智能(AI)呼吸音解释模型。在 AI 和医生标注后,将存在差异的标签视为可疑标签,并由另外两名医生重新标注。最终的标签由医生投票决定。使用敏感度、特异性和接收者操作特征曲线下的面积(AUROC)评估人类和 AI 对呼吸音的识别能力。

结果/结论:共标注了 11532 个呼吸音文件,其中确定了 579 个可疑标签。重新标注和排除后,有 305 个标签有金标准。对于喘鸣音,人类医生和 AI 模型均表现出良好的敏感度(89.5% 对 86.0%)和良好的特异性(96.4% 对 95.2%)。对于爆裂音,人类医生和 AI 模型均表现出良好的敏感度(93.9% 对 80.3%),但特异性较差(56.6% 对 65.9%)。与喘鸣音相比,医生和 AI 模型在识别爆裂音时的 AUROC 值较低。

结论

即使有人工智能工具的协助,准确识别爆裂音仍具有挑战性。因此,爆裂音不可靠,不能用于医疗决策,需要进一步检查。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验