Suppr超能文献

人工智能衍生算法中的频谱偏差:以心电图检测主动脉瓣狭窄为例的研究

Spectrum bias in algorithms derived by artificial intelligence: a case study in detecting aortic stenosis using electrocardiograms.

作者信息

Tseng Andrew S, Shelly-Cohen Michal, Attia Itzhak Z, Noseworthy Peter A, Friedman Paul A, Oh Jae K, Lopez-Jimenez Francisco

机构信息

Department of Cardiovascular Medicine, Mayo Clinic, 200 First Street Southwest, Rochester, MN 55905, USA.

出版信息

Eur Heart J Digit Health. 2021 Jul 14;2(4):561-567. doi: 10.1093/ehjdh/ztab061. eCollection 2021 Dec.

Abstract

AIMS

Spectrum bias can arise when a diagnostic test is derived from study populations with different disease spectra than the target population, resulting in poor generalizability. We used a real-world artificial intelligence (AI)-derived algorithm to detect severe aortic stenosis (AS) to experimentally assess the effect of spectrum bias on test performance.

METHODS AND RESULTS

All adult patients at the Mayo Clinic between 1 January 1989 and 30 September 2019 with transthoracic echocardiograms within 180 days after electrocardiogram (ECG) were identified. Two models were developed from two distinct patient cohorts: a whole-spectrum cohort comparing severe AS to any non-severe AS and an extreme-spectrum cohort comparing severe AS to no AS at all. Model performance was assessed. Overall, 258 607 patients had valid ECG and echocardiograms pairs. The area under the receiver operator curve was 0.87 and 0.91 for the whole-spectrum and extreme-spectrum models, respectively. Sensitivity and specificity for the whole-spectrum model was 80% and 81%, respectively, while for the extreme-spectrum model it was 84% and 84%, respectively. When applying the AI-ECG derived from the extreme-spectrum cohort to patients in the whole-spectrum cohort, the sensitivity, specificity, and area under the curve dropped to 83%, 73%, and 0.86, respectively.

CONCLUSION

While the algorithm performed robustly in identifying severe AS, this study shows that limiting datasets to clearly positive or negative labels leads to overestimation of test performance when testing an AI algorithm in the setting of classifying severe AS using ECG data. While the effect of the bias may be modest in this example, clinicians should be aware of the existence of such a bias in AI-derived algorithms.

摘要

目的

当诊断试验来源于与目标人群疾病谱不同的研究人群时,可能会出现谱偏倚,从而导致可推广性差。我们使用一种基于真实世界人工智能(AI)的算法来检测重度主动脉瓣狭窄(AS),以实验性评估谱偏倚对试验性能的影响。

方法和结果

确定了1989年1月1日至2019年9月30日在梅奥诊所就诊、心电图(ECG)后180天内进行经胸超声心动图检查的所有成年患者。从两个不同的患者队列中开发了两个模型:一个全谱队列,将重度AS与任何非重度AS进行比较;一个极端谱队列,将重度AS与完全没有AS进行比较。评估了模型性能。总体而言,258607例患者有有效的ECG和超声心动图配对数据。全谱模型和极端谱模型的受试者操作特征曲线下面积分别为0.87和0.91。全谱模型的敏感性和特异性分别为80%和81%,而极端谱模型的敏感性和特异性分别为84%和84%。当将从极端谱队列得出的AI-ECG应用于全谱队列中的患者时,敏感性、特异性和曲线下面积分别降至83%、73%和0.86。

结论

虽然该算法在识别重度AS方面表现强劲,但本研究表明,在使用ECG数据对重度AS进行分类的背景下测试AI算法时,将数据集限制为明确的阳性或阴性标签会导致对试验性能的高估。虽然在这个例子中偏倚的影响可能较小,但临床医生应意识到AI衍生算法中存在这种偏倚。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cd0/9707965/566bb5f1235f/ztab061f3.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验