Suppr超能文献

多模态视网膜图像自动标签清理的效率与安全性

Efficiency and safety of automated label cleaning on multimodal retinal images.

作者信息

Lin Tian, Wang Meng, Lin Aidi, Mai Xiaoting, Liang Huiyu, Tham Yih-Chung, Chen Haoyu

机构信息

Joint Shantou International Eye Center, Shantou University and the Chinese University of Hong Kong, Shantou, Guangdong, 515041, China.

Shantou University Medical College, Shantou, Guangdong, 515041, China.

出版信息

NPJ Digit Med. 2025 Jan 5;8(1):10. doi: 10.1038/s41746-024-01424-x.

Abstract

Label noise is a common and important issue that would affect the model's performance in artificial intelligence. This study assessed the effectiveness and potential risks of automated label cleaning using an open-source framework, Cleanlab, in multi-category datasets of fundus photography and optical coherence tomography, with intentionally introduced label noise ranging from 0 to 70%. After six cycles of automatic cleaning, significant improvements are achieved in label accuracies (3.4-62.9%) and dataset quality scores (DQS, 5.1-74.4%). The majority (86.6 to 97.5%) of label errors were accurately modified, with minimal missed (0.5-2.8%) or misclassified (0.4-10.6%). The classification accuracy of RETFound significantly improved by 0.3-52.9% when trained with the datasets after cleaning. We also developed a DQS-guided cleaning strategy to mitigate over-cleaning. Furthermore, external validation on EyePACS and APTOS-2019 datasets boosted label accuracy by 1.3 and 1.8%, respectively. This approach automates label correction, enhances dataset reliability, and strengthens model performance efficiently and safely.

摘要

标签噪声是人工智能中一个常见且重要的问题,会影响模型性能。本研究使用开源框架Cleanlab评估了在眼底摄影和光学相干断层扫描的多类别数据集中自动标签清理的有效性和潜在风险,其中故意引入了0%至70%的标签噪声。经过六个周期的自动清理后,标签准确率(提高了3.4%至62.9%)和数据集质量分数(DQS,提高了5.1%至74.4%)都有显著提高。大多数(86.6%至97.5%)的标签错误都得到了准确修正,漏判(0.5%至2.8%)或误判(0.4%至10.6%)极少。使用清理后的数据集进行训练时,RETFound的分类准确率显著提高了0.3%至52.9%。我们还开发了一种由DQS引导的清理策略,以减轻过度清理的问题。此外,在EyePACS和APTOS - 2019数据集上的外部验证分别将标签准确率提高了1.3%和1.8%。这种方法能自动进行标签校正,有效且安全地提高数据集的可靠性,并增强模型性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4a6/11701072/d50cbb3c4b01/41746_2024_1424_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验