Suppr超能文献

从数据到诊断:人工智能用皮肤癌图像数据集。

From data to diagnosis: skin cancer image datasets for artificial intelligence.

机构信息

Department of Dermatology, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.

Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, UK.

出版信息

Clin Exp Dermatol. 2024 Jun 25;49(7):675-685. doi: 10.1093/ced/llae112.

Abstract

Artificial intelligence (AI) solutions for skin cancer diagnosis continue to gain momentum, edging closer towards broad clinical use. These AI models, particularly deep-learning architectures, require large digital image datasets for development. This review provides an overview of the datasets used to develop AI algorithms and highlights the importance of dataset transparency for the evaluation of algorithm generalizability across varying populations and settings. Current challenges for curation of clinically valuable datasets are detailed, which include dataset shifts arising from demographic variations and differences in data collection methodologies, along with inconsistencies in labelling. These shifts can lead to differential algorithm performance, compromise of clinical utility, and the propagation of discriminatory biases when developed algorithms are implemented in mismatched populations. Limited representation of rare skin cancers and minoritized groups in existing datasets are highlighted, which can further skew algorithm performance. Strategies to address these challenges are presented, which include improving transparency, representation and interoperability. Federated learning and generative methods, which may improve dataset size and diversity without compromising privacy, are also examined. Lastly, we discuss model-level techniques that may address biases entrained through the use of datasets derived from routine clinical care. As the role of AI in skin cancer diagnosis becomes more prominent, ensuring the robustness of underlying datasets is increasingly important.

摘要

人工智能(AI)在皮肤癌诊断中的应用不断发展,越来越接近广泛的临床应用。这些 AI 模型,特别是深度学习架构,需要大量的数字图像数据集来进行开发。本综述概述了用于开发 AI 算法的数据集,并强调了数据集透明度对于评估算法在不同人群和环境中的泛化能力的重要性。详细介绍了目前在有价值的临床数据集管理方面面临的挑战,包括由于人口统计学差异和数据收集方法的差异导致的数据集中的数据集转移,以及标签不一致。这些转变可能导致算法性能的差异,影响临床实用性,并在将开发的算法应用于不匹配的人群时传播歧视性偏见。现有数据集中罕见皮肤癌和少数群体代表性不足的情况也被突出强调,这可能进一步影响算法性能。提出了一些应对这些挑战的策略,包括提高透明度、代表性和互操作性。还研究了联邦学习和生成方法,这些方法可以在不损害隐私的情况下提高数据集的规模和多样性。最后,我们讨论了可能解决通过使用源自常规临床护理的数据集引起的偏差的模型级技术。随着 AI 在皮肤癌诊断中的作用越来越突出,确保基础数据集的稳健性变得越来越重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验