Suppr超能文献

用于人工智能技术开发的乳腺X线摄影数据集的多样性、包容性和可追溯性:一项系统评价

Diversity, inclusivity and traceability of mammography datasets used in development of Artificial Intelligence technologies: a systematic review.

作者信息

Laws Elinor, Palmer Joanne, Alderman Joseph, Sharma Ojasvi, Ngai Victoria, Salisbury Thomas, Hussain Gulmeena, Ahmed Sumiya, Sachdeva Gagandeep, Vadera Sonam, Mateen Bilal, Matin Rubeta, Kuku Stephanie, Calvert Melanie, Gath Jacqui, Treanor Darren, McCradden Melissa, Mackintosh Maxine, Gichoya Judy, Trivedi Hari, Denniston Alastair K, Liu Xiaoxuan

机构信息

University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK.

University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK.

出版信息

Clin Imaging. 2025 Feb;118:110369. doi: 10.1016/j.clinimag.2024.110369. Epub 2024 Nov 26.

Abstract

PURPOSE

There are many radiological datasets for breast cancer, some which have supported the development of AI medical devices for breast cancer screening and image classification. This review aims to identify mammography datasets (including digitised screen film mammography, 2D digital mammography and digital breast tomosynthesis) used in the development of AI technologies and present their characteristics, including their transparency of documentation, content, populations included and accessibility.

MATERIALS AND METHODS

MEDLINE and Google Dataset searches identified studies describing AI technology development and referencing breast imaging datasets up to June 2024. The characteristics of each dataset are summarised. In particular, the accompanying documentation was reviewed with a focus on diversity and inclusion of populations represented within each dataset.

RESULTS

254 datasets were referenced in the literature search, 190 were privately held, 36 had barriers which prevented access, and 28 were accessible. Most datasets originated from Europe, East Asia and North America. There was poor reporting of individuals' attributes: 32 (12 %) datasets reported race or ethnicity; 76 (30 %) reported female/male categories with only one dataset explicitly defining whether these categories represented sex or gender attributes.

CONCLUSION

Through this review, we demonstrate gaps in the data landscape for mammography, highlighting poor representation globally. To ensure datasets in breast imaging have maximum utility for researchers, their characteristics should be documented and limitations of datasets, such as their representativeness of populations and settings, should inform scientific efforts to translate data-driven insights into technologies and discoveries.

摘要

目的

有许多用于乳腺癌的放射学数据集,其中一些支持了用于乳腺癌筛查和图像分类的人工智能医疗设备的开发。本综述旨在确定用于人工智能技术开发的乳腺X线摄影数据集(包括数字化屏-片乳腺X线摄影、二维数字乳腺X线摄影和数字乳腺断层合成),并介绍其特征,包括文档的透明度、内容、纳入的人群和可获取性。

材料与方法

通过检索MEDLINE和谷歌数据集,确定了截至2024年6月描述人工智能技术开发并引用乳腺成像数据集的研究。总结了每个数据集的特征。特别是,对随附文档进行了审查,重点关注每个数据集中所代表人群的多样性和包容性。

结果

在文献检索中引用了254个数据集,其中190个为私有数据集,36个存在获取障碍,28个可获取。大多数数据集来自欧洲、东亚和北美。个体属性的报告情况较差:32个(12%)数据集报告了种族或民族;76个(30%)报告了女性/男性类别,只有一个数据集明确界定这些类别代表的是性属性还是性别属性。

结论

通过本综述,我们展示了乳腺X线摄影数据领域的差距,突出了全球代表性不足的问题。为确保乳腺成像数据集对研究人员具有最大效用,应记录其特征,并且数据集的局限性,如人群和环境的代表性,应为将数据驱动的见解转化为技术和发现的科学努力提供参考。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验