Suppr超能文献

用于减少基于图像的训练数据需求的生成式人工智能增强技术。

Generative Artificial Intelligence Enhancements for Reducing Image-based Training Data Requirements.

作者信息

Chen Dake, Han Ying, Duncan Jacque, Jia Lin, Shan Jing

机构信息

Department of Ophthalmology, University of California, San Francisco, San Francisco, California.

Digillect LLC, San Francisco, California.

出版信息

Ophthalmol Sci. 2024 Apr 14;4(5):100531. doi: 10.1016/j.xops.2024.100531. eCollection 2024 Sep-Oct.

Abstract

OBJECTIVE

Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the current lack of access to health care by under-privileged social groups will translate into future bias in health care AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data.

DESIGN

Computational study with open-source data.

SUBJECTS

The data were obtained from 6 open-source datasets comprising patients aged 40-80 years in Singapore, China, India, and Spain.

METHODS

The reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy.

MAIN OUTCOME MEASURES

Area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance.

RESULTS

Results show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models.

CONCLUSIONS

Our findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond health care, toward empowering AI adoption for all similarly data-challenged fields.

FINANCIAL DISCLOSURES

The authors have no proprietary or commercial interest in any materials discussed in this article.

摘要

目的

训练数据推动并塑造人工智能(AI)模型的发展。大量的数据需求是限制人工智能工具在数据天然稀缺领域取得成功的主要瓶颈。在医疗保健领域,训练数据难以整理,引发了人们越来越多的担忧,即当前弱势群体难以获得医疗保健服务的状况将导致未来医疗保健人工智能出现偏差。在本报告中,我们开发了一种自动编码器,以扩充和增强天然稀缺的数据集,从而减轻我们对大数据的依赖。

设计

使用开源数据进行的计算研究。

研究对象

数据来自6个开源数据集,这些数据集包含新加坡、中国、印度和西班牙40至80岁的患者。

方法

所报告的框架基于真实世界的患者影像数据生成合成图像。作为一个测试案例,我们使用自动编码器来扩充公开可用的视盘照片训练集,并评估所得数据集训练人工智能模型以检测青光眼性视神经病变的能力。

主要观察指标

使用受试者操作特征曲线下面积(AUC)来评估青光眼检测器的性能。AUC越高表明检测性能越好。

结果

结果表明,使用自动编码器生成的合成图像增强数据集可得到更优的训练集,从而提高人工智能模型的性能。

结论

我们在此的研究结果有助于解决人工智能模型开发中日益难以维持的数据量和质量要求问题,其影响不仅限于医疗保健领域,还能推动所有类似数据匮乏领域采用人工智能。

财务披露

作者对本文讨论的任何材料均无所有权或商业利益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852e/11283142/8ea355dc7108/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验