• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

以数据为中心的方法提高深度学习模型的性能。

A Data-Centric Approach to improve performance of deep learning models.

机构信息

Department of Computer Engineering, U & P U. Patel, CSPIT, CHARUSAT, Changa, Gujarat, India.

Department of Artificial Intelligence and Machine Learning, CSPIT, CHARUSAT, Changa, Gujarat, India.

出版信息

Sci Rep. 2024 Sep 27;14(1):22329. doi: 10.1038/s41598-024-73643-x.

DOI:10.1038/s41598-024-73643-x
PMID:39333381
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11436781/
Abstract

The Artificial Intelligence has evolved and is now associated with Deep Learning, driven by availability of vast amount of data and computing power. Traditionally, researchers have adopted a Model-Centric Approach, focusing on developing new algorithms and models to enhance performance without altering the underlying data. However, Andrew Ng, a prominent figure in the AI community, has recently emphasized on better (quality) data rather than better models, which has given birth to Data Centric Approach, also known as Data Oriented technique. The transition from model oriented to data oriented approach has rapidly gained momentum within the realm of deep learning. Despite its promise, the Data-Centric Approach faces several challenges, including (a) generating high-quality data, (b) ensuring data privacy, and (c) addressing biases to achieve fairness in datasets. Currently, there has been limited effort in preparing quality data. Our work aims to address this gap by focusing on the generation of high-quality data through methods such as data augmentation, multi-stage hashing to eliminate duplicate instances, to detect and correct noisy labels, using confident learning. The experiments on popular datasets, namely MNIST, Fashion MNIST, and CIFAR-10 were performed by utilizing ResNet-18 as the common framework followed by both Model Centric and Data Centric Approach. Comparative performance analysis revealed that the Data Centric Approach consistently outperformed the Model Centric Approach by a relative margin of at least 3%. This finding highlights the potential for further exploration and adoption of the Data-Centric Approach in various domains such as healthcare, finance, education, and entertainment, where the quality of data could significantly enhance the performance.

摘要

人工智能已经发展到现在与深度学习相关联的地步,这是由大量数据和计算能力的可用性所驱动的。传统上,研究人员采用了以模型为中心的方法,专注于开发新的算法和模型来提高性能,而不改变底层数据。然而,人工智能领域的杰出人物安德鲁·吴(Andrew Ng)最近强调了更好的数据(质量)而不是更好的模型,这催生了以数据为中心的方法,也称为面向数据的技术。从模型导向到数据导向的方法的转变在深度学习领域迅速获得了动力。尽管有其前景,但数据中心方法面临着几个挑战,包括生成高质量数据、确保数据隐私以及解决数据集公平性中的偏差问题。目前,在准备高质量数据方面的努力有限。我们的工作旨在通过数据增强、多阶段哈希以消除重复实例、检测和纠正嘈杂标签、使用置信学习等方法来生成高质量数据,从而解决这个差距。在 MNIST、Fashion MNIST 和 CIFAR-10 等流行数据集上进行了实验,使用 ResNet-18 作为通用框架,分别采用了以模型为中心和以数据为中心的方法。对比性能分析表明,数据中心方法始终优于以模型为中心的方法,相对差距至少为 3%。这一发现强调了在医疗保健、金融、教育和娱乐等各个领域进一步探索和采用以数据为中心的方法的潜力,在这些领域,数据的质量可以显著提高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/99edd3402065/41598_2024_73643_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/23cafff605f4/41598_2024_73643_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/695273f48aa5/41598_2024_73643_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/b367ae77d142/41598_2024_73643_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/dfc7bc3a18f0/41598_2024_73643_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/3fb45be30d1f/41598_2024_73643_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/99edd3402065/41598_2024_73643_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/23cafff605f4/41598_2024_73643_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/695273f48aa5/41598_2024_73643_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/b367ae77d142/41598_2024_73643_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/dfc7bc3a18f0/41598_2024_73643_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/3fb45be30d1f/41598_2024_73643_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/511e/11436781/99edd3402065/41598_2024_73643_Fig6_HTML.jpg

相似文献

1
A Data-Centric Approach to improve performance of deep learning models.以数据为中心的方法提高深度学习模型的性能。
Sci Rep. 2024 Sep 27;14(1):22329. doi: 10.1038/s41598-024-73643-x.
2
From data to diagnosis: skin cancer image datasets for artificial intelligence.从数据到诊断:人工智能用皮肤癌图像数据集。
Clin Exp Dermatol. 2024 Jun 25;49(7):675-685. doi: 10.1093/ced/llae112.
3
Diabetic retinopathy screening through artificial intelligence algorithms: A systematic review.基于人工智能算法的糖尿病视网膜病变筛查:系统综述。
Surv Ophthalmol. 2024 Sep-Oct;69(5):707-721. doi: 10.1016/j.survophthal.2024.05.008. Epub 2024 Jun 15.
4
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
5
Developing and Evaluating Deep Learning Algorithms for Object Detection: Key Points for Achieving Superior Model Performance.开发和评估用于目标检测的深度学习算法:实现卓越模型性能的要点。
Korean J Radiol. 2023 Jul;24(7):698-714. doi: 10.3348/kjr.2022.0765.
6
Fairness of artificial intelligence in healthcare: review and recommendations.人工智能在医疗保健中的公平性:综述与建议。
Jpn J Radiol. 2024 Jan;42(1):3-15. doi: 10.1007/s11604-023-01474-3. Epub 2023 Aug 4.
7
Low-Shot Deep Learning of Diabetic Retinopathy With Potential Applications to Address Artificial Intelligence Bias in Retinal Diagnostics and Rare Ophthalmic Diseases.基于少量样本的深度学习在糖尿病视网膜病变中的应用及其对解决视网膜诊断中人工智能偏倚和罕见眼病问题的潜力。
JAMA Ophthalmol. 2020 Oct 1;138(10):1070-1077. doi: 10.1001/jamaophthalmol.2020.3269.
8
Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review.当前人工智能药物发现中解决数据稀缺问题的策略:全面综述。
Comput Biol Med. 2024 Sep;179:108734. doi: 10.1016/j.compbiomed.2024.108734. Epub 2024 Jul 3.
9
A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems.一种混合人工智能模型利用多中心临床数据,改善跨时间 lapse 系统的胎儿心率妊娠预测。
Hum Reprod. 2023 Apr 3;38(4):596-608. doi: 10.1093/humrep/dead023.
10
Multiclass skin lesion localization and classification using deep learning based features fusion and selection framework for smart healthcare.基于深度学习的特征融合与选择框架的多类别皮肤病变定位与分类在智能医疗中的应用。
Neural Netw. 2023 Mar;160:238-258. doi: 10.1016/j.neunet.2023.01.022. Epub 2023 Jan 24.

引用本文的文献

1
Enhancing security in electromagnetic radiation therapy using fuzzy graph theory.运用模糊图论增强电磁辐射治疗中的安全性。
Sci Rep. 2025 Apr 16;15(1):13139. doi: 10.1038/s41598-025-98110-z.
2
Preserving Informative Presence: How Missing Data and Imputation Strategies Affect the Performance of an AI-Based Early Warning Score.保留信息性存在:缺失数据和插补策略如何影响基于人工智能的早期预警评分的性能
J Clin Med. 2025 Mar 24;14(7):2213. doi: 10.3390/jcm14072213.
3
Building Better Deep Learning Models Through Dataset Fusion: A Case Study in Skin Cancer Classification with Hyperdatasets.

本文引用的文献

1
Data-centric multi-task surgical phase estimation with sparse scene segmentation.基于数据的多任务手术阶段估计与稀疏场景分割。
Int J Comput Assist Radiol Surg. 2022 May;17(5):953-960. doi: 10.1007/s11548-022-02616-0. Epub 2022 May 3.
2
Artificial intelligence: A powerful paradigm for scientific research.人工智能:科学研究的强大范式。
Innovation (Camb). 2021 Oct 28;2(4):100179. doi: 10.1016/j.xinn.2021.100179. eCollection 2021 Nov 28.
通过数据集融合构建更好的深度学习模型:超数据集在皮肤癌分类中的案例研究
Diagnostics (Basel). 2025 Feb 3;15(3):352. doi: 10.3390/diagnostics15030352.
4
Diverse Dataset for Eyeglasses Detection: Extending the Flickr-Faces-HQ (FFHQ) Dataset.用于眼镜检测的多样化数据集:扩展Flickr人脸高质量(FFHQ)数据集
Sensors (Basel). 2024 Dec 1;24(23):7697. doi: 10.3390/s24237697.