• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为人工智能准备临床研究数据:来自美国国立糖尿病、消化和肾脏疾病研究所以数据为中心挑战的见解。

Preparing clinical research data for artificial intelligence readiness: insights from the National Institute of Diabetes and Digestive and Kidney Diseases data centric challenge.

作者信息

Domagalski Marcin J, Lu Yin, Pilozzi Alexander, Williamson Alicia, Chilappagari Padmini, Luker Emma, Shelley Courtney D, Dabic Anya, Keller Michael A, Rodriguez Rebecca M, Lawlor Sharon, Thangudu Ratna R

机构信息

Health Analytics, Research and Technology (HART), ICF, Rockville, MD 20850, United States.

Health and Life Sciences, Booz Allen Hamilton, Inc., McLean, VA 22102, United States.

出版信息

J Am Med Inform Assoc. 2025 Oct 1;32(10):1609-1616. doi: 10.1093/jamia/ocaf114.

DOI:10.1093/jamia/ocaf114
PMID:40705952
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12451923/
Abstract

OBJECTIVES

The success of artificial intelligence (AI) and machine learning (ML) approaches in biomedical research depends on the quality of the underlying data. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Centric Challenge was designed to address the challenge of making raw clinical research data AI ready, with a focus on type 1 diabetes studies available in the NIDDK Central Repository (NIDDK-CR). This paper aims to present a structured methodology for enhancing the AI readiness of clinical datasets.

MATERIALS AND METHODS

We detail a systematic approach for data aggregation and preprocessing, including binning continuous data, processing text features, managing missing values, and encoding for categorical variables while maintaining the data integrity and compatibility with ML algorithms.

RESULTS

We applied the proposed methodology to transform raw clinical data from type 1 diabetes studies in the NIDDK-CR into a structured, AI-ready dataset. The evaluation process validated the effectiveness of our AI-readiness enhancement steps and explored the potential use cases in type 1 diabetes research.

DISCUSSION

The methodology discussed in this paper will serve as guidance for preparing data for AI-driven clinical research, with the resulting AI-ready data to serve as a training tool for building and improving AI/ML model performance.

CONCLUSION

We present a generalizable framework for preparing clinical research data for AI applications. The resulting datasets lay a strong foundation for downstream AI/ML applications, setting the stage for a new era of data-driven discoveries.

摘要

目标

人工智能(AI)和机器学习(ML)方法在生物医学研究中的成功取决于基础数据的质量。美国国立糖尿病、消化和肾脏疾病研究所(NIDDK)以数据为中心的挑战旨在应对使原始临床研究数据适用于AI的挑战,重点关注NIDDK中央存储库(NIDDK-CR)中可用的1型糖尿病研究。本文旨在提出一种结构化方法,以提高临床数据集对AI的适用性。

材料与方法

我们详细介绍了一种数据聚合和预处理的系统方法,包括对连续数据进行分箱、处理文本特征、管理缺失值以及对分类变量进行编码,同时保持数据完整性并与ML算法兼容。

结果

我们应用所提出的方法将NIDDK-CR中1型糖尿病研究的原始临床数据转换为结构化的、适用于AI的数据集。评估过程验证了我们提高数据对AI适用性步骤的有效性,并探索了1型糖尿病研究中的潜在用例。

讨论

本文讨论的方法将为AI驱动的临床研究数据准备提供指导,生成的适用于AI的数据将作为构建和改进AI/ML模型性能的训练工具。

结论

我们提出了一个可推广的框架,用于为AI应用准备临床研究数据。生成的数据集为下游AI/ML应用奠定了坚实基础,为数据驱动发现的新时代奠定了基础。

相似文献

1
Preparing clinical research data for artificial intelligence readiness: insights from the National Institute of Diabetes and Digestive and Kidney Diseases data centric challenge.为人工智能准备临床研究数据:来自美国国立糖尿病、消化和肾脏疾病研究所以数据为中心挑战的见解。
J Am Med Inform Assoc. 2025 Oct 1;32(10):1609-1616. doi: 10.1093/jamia/ocaf114.
2
Fostering Multidisciplinary Collaboration in Artificial Intelligence and Machine Learning Education: Tutorial Based on the AI-READI Bootcamp.促进人工智能与机器学习教育中的多学科合作:基于AI-READI训练营的教程
JMIR Med Educ. 2025 Dec 29;11:e83154. doi: 10.2196/83154.
3
A roadmap to artificial intelligence (AI): Methods for designing and building AI ready data to promote fairness.人工智能(AI)路线图:设计和构建 AI 就绪数据的方法,以促进公平性。
J Biomed Inform. 2024 Jun;154:104654. doi: 10.1016/j.jbi.2024.104654. Epub 2024 May 11.
4
Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care.人工智能和机器学习准备的数据预处理技术:癌症护理中可穿戴传感器数据的范围综述。
JMIR Mhealth Uhealth. 2024 Sep 27;12:e59587. doi: 10.2196/59587.
5
Streamlining medical software development with CARE lifecycle and CARE agent: an AI-driven technology readiness level assessment tool.利用CARE生命周期和CARE代理简化医疗软件开发:一种人工智能驱动的技术就绪水平评估工具。
BMC Med Inform Decis Mak. 2025 Jul 8;25(1):254. doi: 10.1186/s12911-025-03099-0.
6
Transforming Big Data into AI-ready data for nutrition and obesity research.将大数据转化为适用于营养与肥胖研究的人工智能可用数据。
Obesity (Silver Spring). 2024 May;32(5):857-870. doi: 10.1002/oby.23989. Epub 2024 Mar 1.
7
A practical guide for nephrologist peer reviewers: evaluating artificial intelligence and machine learning research in nephrology.肾病学家同行评审员实用指南:评估肾脏病学中的人工智能和机器学习研究。
Ren Fail. 2025 Dec;47(1):2513002. doi: 10.1080/0886022X.2025.2513002. Epub 2025 Jul 7.
8
The National Institute of Diabetes and Digestive and Kidney Diseases Central Repositories: a valuable resource for nephrology research.美国国立糖尿病、消化和肾脏疾病研究所中央储存库:肾脏病研究的宝贵资源。
Clin J Am Soc Nephrol. 2015 Apr 7;10(4):710-5. doi: 10.2215/CJN.06570714. Epub 2014 Nov 6.
9
Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.基于人工智能的基因组学和用于高通量筛选研究的自动显微镜图像分析中的数据管理与整理实践:推动可靠且符合伦理的人工智能应用。
Hum Genomics. 2025 Feb 23;19(1):16. doi: 10.1186/s40246-025-00716-x.
10
Consensus statements on the current landscape of artificial intelligence applications in endoscopy, addressing roadblocks, and advancing artificial intelligence in gastroenterology.关于人工智能在内窥镜检查中的当前应用情况、解决障碍以及推动胃肠病学领域人工智能发展的共识声明。
Gastrointest Endosc. 2025 Jan;101(1):2-9.e1. doi: 10.1016/j.gie.2023.12.003. Epub 2024 Apr 17.

本文引用的文献

1
AI-ready rectal cancer MR imaging: a workflow for tumor detection and segmentation.适用于人工智能的直肠癌磁共振成像:肿瘤检测与分割工作流程
BMC Med Imaging. 2025 Mar 14;25(1):88. doi: 10.1186/s12880-025-01614-3.
2
A roadmap to artificial intelligence (AI): Methods for designing and building AI ready data to promote fairness.人工智能(AI)路线图:设计和构建 AI 就绪数据的方法,以促进公平性。
J Biomed Inform. 2024 Jun;154:104654. doi: 10.1016/j.jbi.2024.104654. Epub 2024 May 11.
3
Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care.人工智能和机器学习准备的数据预处理技术:癌症护理中可穿戴传感器数据的范围综述。
JMIR Mhealth Uhealth. 2024 Sep 27;12:e59587. doi: 10.2196/59587.
4
Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study.数据专家对适合机器学习的健康数据集的重要特征的看法:一项定性研究。
JAMA Netw Open. 2023 Dec 1;6(12):e2345892. doi: 10.1001/jamanetworkopen.2023.45892.
5
Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques.处理医疗保健数据中的缺失值:基于深度学习的插补技术的系统评价。
Artif Intell Med. 2023 Aug;142:102587. doi: 10.1016/j.artmed.2023.102587. Epub 2023 May 22.
6
A Survey of Data Quality Measurement and Monitoring Tools.数据质量测量与监测工具调查
Front Big Data. 2022 Mar 31;5:850611. doi: 10.3389/fdata.2022.850611. eCollection 2022.
7
Machine Learning in Medicine.医学中的机器学习
N Engl J Med. 2019 Apr 4;380(14):1347-1358. doi: 10.1056/NEJMra1814259.
8
A guide to deep learning in healthcare.深度学习在医疗保健中的应用指南。
Nat Med. 2019 Jan;25(1):24-29. doi: 10.1038/s41591-018-0316-z. Epub 2019 Jan 7.
9
The Environmental Determinants of Diabetes in the Young (TEDDY) Study: 2018 Update.《儿童期糖尿病的环境决定因素研究(TEDDY):2018 更新》。
Curr Diab Rep. 2018 Oct 23;18(12):136. doi: 10.1007/s11892-018-1113-2.
10
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.