Suppr超能文献

对乳腺 X 线摄影机器学习数据集的回顾,以及它们对 FAIR 原则的遵守情况和未来展望。

A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future.

机构信息

Alixir Technologies Pty Ltd, Sydney, NSW, Australia.

Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW, Australia.

出版信息

Sci Data. 2023 Sep 8;10(1):595. doi: 10.1038/s41597-023-02430-6.

Abstract

The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.

摘要

乳腺癌发病率的不断上升,特别是在新兴经济体中,促使人们对基于深度学习的可扩展解决方案产生了兴趣,这些解决方案可以提高乳房 X 光筛查的准确性和成本效益。然而,此类工具需要大量高质量的训练数据,这在获取方面可能具有挑战性。本文结合了人工智能初创公司的经验以及对八个可用数据集的 FAIR 原则的分析。结果表明,这些数据集差异很大,特别是在互操作性方面,因为每个数据集都偏向于特定的临床用例。此外,数字化采集和扫描胶片的混合增加了可变性的问题,同时还存在许可条款、访问便利性、标签可靠性和文件格式方面的差异。通过遵守标签和注释的 BIRADS 标准以及一致的文件格式等标准来提高互操作性,可以显著改善对更多标准化数据的访问和使用。通过基于 GAN 的合成数据生成,可以进一步增加这种情况,从而为乳腺癌带来更好的健康结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d241/10491669/796487d7b2aa/41597_2023_2430_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验