• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用临床大数据自动构建机器学习模型:方案原理与方法

Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.

作者信息

Luo Gang, Stone Bryan L, Johnson Michael D, Tarczy-Hornoch Peter, Wilcox Adam B, Mooney Sean D, Sheng Xiaoming, Haug Peter J, Nkoy Flory L

机构信息

Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.

Department of Pediatrics, University of Utah, Salt Lake City, UT, United States.

出版信息

JMIR Res Protoc. 2017 Aug 29;6(8):e175. doi: 10.2196/resprot.7757.

DOI:10.2196/resprot.7757
PMID:28851678
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5596298/
Abstract

BACKGROUND

To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets.

OBJECTIVE

This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data.

METHODS

This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers; and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes.

RESULTS

We are currently writing Auto-ML's design document. We intend to finish our study by around the year 2022.

CONCLUSIONS

Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, health care researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in health care and improve patient outcomes.

摘要

背景

为了改善健康状况并降低医疗成本,我们常常需要使用大型临床数据集(即临床大数据)进行预测/分类,例如识别需要预防性干预的高风险患者。机器学习已被视为实现这一目标的关键技术。机器学习在大多数数据科学竞赛中获胜,并能支持许多临床活动,但仅有15%的医院将其用于哪怕是有限的目的。尽管医疗保健研究人员熟悉数据,但他们往往缺乏直接使用临床大数据的机器学习专业知识,这在从数据中实现价值方面构成了障碍。医疗保健研究人员可以与拥有深厚机器学习知识的数据科学家合作,但双方有效沟通需要花费时间和精力。由于美国数据科学家短缺,且面临资金雄厚的公司的招聘竞争,医疗保健系统在招募数据科学家方面存在困难。构建和推广机器学习模型通常需要数据科学家进行数百到数千次手动迭代,以选择以下内容:(1)对模型准确性有重大影响的超参数值和复杂算法,以及(2)用于临时聚合临床属性的运算符和时间段(例如,患者体重在过去一年是否持续上升)。在预算有限的情况下,这个过程变得不可行。

目的

本研究的目标是使医疗保健研究人员能够直接使用临床大数据,在预算和数据科学家资源有限的情况下使机器学习可行,并从数据中实现价值。

方法

本研究将使我们能够实现以下目标:(1)完成开发新软件“自动化机器学习(Auto-ML)”,以实现对临床大数据进行机器学习的模型选择自动化,并在七个具有临床重要性的基准建模问题上验证Auto-ML;(2)将Auto-ML和新方法应用于两个对护理管理分配至关重要的新建模问题,并与护理经理试用一个模型;(3)进行模拟,以估计采用Auto-ML对美国患者结局的影响。

结果

我们目前正在撰写Auto-ML的设计文档。我们打算在2022年左右完成本研究。

结论

Auto-ML将适用于各种临床预测/分类问题。在数据科学家的最少帮助下,医疗保健研究人员可以使用Auto-ML快速构建高质量模型。这将推动机器学习在医疗保健领域的更广泛应用,并改善患者结局。

相似文献

1
Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.利用临床大数据自动构建机器学习模型:方案原理与方法
JMIR Res Protoc. 2017 Aug 29;6(8):e175. doi: 10.2196/resprot.7757.
2
PredicT-ML: a tool for automating machine learning model building with big clinical data.PredicT-ML:一个利用大型临床数据自动化机器学习模型构建的工具。
Health Inf Sci Syst. 2016 Jun 8;4:5. doi: 10.1186/s13755-016-0018-1. eCollection 2016.
3
MLBCD: a machine learning tool for big clinical data.MLBCD:用于大临床数据的机器学习工具。
Health Inf Sci Syst. 2015 Sep 28;3:3. doi: 10.1186/s13755-015-0011-0. eCollection 2015.
4
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.
5
Physician-Friendly Machine Learning: A Case Study with Cardiovascular Disease Risk Prediction.对医生友好的机器学习:心血管疾病风险预测案例研究
J Clin Med. 2019 Jul 18;8(7):1050. doi: 10.3390/jcm8071050.
6
Validating a Machine Learning Algorithm to Predict 30-Day Re-Admissions in Patients With Heart Failure: Protocol for a Prospective Cohort Study.验证一种用于预测心力衰竭患者30天再入院情况的机器学习算法:一项前瞻性队列研究方案
JMIR Res Protoc. 2018 Sep 4;7(9):e176. doi: 10.2196/resprot.9466.
7
ASAS-NANP symposium: mathematical modeling in animal nutrition-Making sense of big data and machine learning: how open-source code can advance training of animal scientists.ASAS-NANP 研讨会:动物营养中的数学建模——从大数据和机器学习中得出意义:开源代码如何促进动物科学家的培训。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad317.
8
Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病:模型开发与性能评估
JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.
9
Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.基于渐进采样的贝叶斯优化,用于高效自动的机器学习模型选择。
Health Inf Sci Syst. 2017 Sep 27;5(1):2. doi: 10.1007/s13755-017-0023-z. eCollection 2017 Dec.
10
Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.急诊科脓毒症患者院内死亡率的预测:一种基于本地大数据驱动的机器学习方法。
Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.

引用本文的文献

1
Enhancing Health Research with Machine Learning: Practical Case Studies Using the Researcher Workbench.利用机器学习加强健康研究:使用研究人员工作台的实际案例研究
Data Sci Sci. 2025;4(1). doi: 10.1080/26941899.2025.2523871. Epub 2025 Jul 3.
2
Performance Comparison of 10 State-of-the-Art Machine Learning Algorithms for Outcome Prediction Modeling of Radiation-Induced Toxicity.用于辐射诱导毒性结果预测建模的10种先进机器学习算法的性能比较
Adv Radiat Oncol. 2024 Nov 13;10(2):101675. doi: 10.1016/j.adro.2024.101675. eCollection 2025 Feb.
3
AutoPrognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning.

本文引用的文献

1
General Symptom Extraction from VA Electronic Medical Notes.从退伍军人事务部电子病历中提取一般症状
Stud Health Technol Inform. 2017;245:356-360.
2
Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.基于渐进采样的贝叶斯优化,用于高效自动的机器学习模型选择。
Health Inf Sci Syst. 2017 Sep 27;5(1):2. doi: 10.1007/s13755-017-0023-z. eCollection 2017 Dec.
3
Hypothesis-Free Search for Connections between Birth Month and Disease Prevalence in Large, Geographically Varied Cohorts.
自动预后2.0:通过自动化机器学习实现医疗保健中诊断和预后建模的普及。
PLOS Digit Health. 2023 Jun 22;2(6):e0000276. doi: 10.1371/journal.pdig.0000276. eCollection 2023 Jun.
4
A machine learning analysis of correlates of mortality among patients hospitalized with COVID-19.基于机器学习的 COVID-19 住院患者死亡率相关因素分析。
Sci Rep. 2023 Mar 11;13(1):4080. doi: 10.1038/s41598-023-31251-1.
5
The Value of the First Clinical Impression as Assessed by 18 Observations in Patients Presenting to the Emergency Department.通过对急诊科患者的18项观察评估首次临床印象的价值。
J Clin Med. 2023 Jan 16;12(2):724. doi: 10.3390/jcm12020724.
6
Designing and identifying β-hairpin peptide macrocycles with antibiotic potential.设计和鉴定具有抗生素潜力的β-发夹肽大环。
Sci Adv. 2023 Jan 13;9(2):eade0008. doi: 10.1126/sciadv.ade0008. Epub 2023 Jan 11.
7
Assess and validate predictive performance of models for in-hospital mortality in COVID-19 patients: A retrospective cohort study in the Netherlands comparing the value of registry data with high-granular electronic health records.评估和验证 COVID-19 住院患者院内死亡率预测模型的性能:荷兰一项回顾性队列研究比较了登记数据与高粒度电子健康记录的价值。
Int J Med Inform. 2022 Nov;167:104863. doi: 10.1016/j.ijmedinf.2022.104863. Epub 2022 Sep 22.
8
Benchmarking AutoML frameworks for disease prediction using medical claims.使用医疗理赔数据对用于疾病预测的自动化机器学习框架进行基准测试。
BioData Min. 2022 Jul 26;15(1):15. doi: 10.1186/s13040-022-00300-2.
9
Improving the Accuracy of Progress Indication for Constructing Deep Learning Models.提高深度学习模型构建中进度指示的准确性。
IEEE Access. 2022;10:63754-63781. doi: 10.1109/access.2022.3181493. Epub 2022 Jun 8.
10
Current Understanding of Transfusion-associated Necrotizing Enterocolitis: Review of Clinical and Experimental Studies and a Call for More Definitive Evidence.目前对输血相关坏死性小肠结肠炎的认识:临床与实验研究综述及对更多确凿证据的呼吁
Newborn (Clarksville). 2022 Jan-Mar;1(1):201-208. doi: 10.5005/jp-journals-11002-0005. Epub 2022 Mar 31.
在地域多样的大型队列中对出生月份与疾病患病率之间的关联进行无假设搜索。
AMIA Annu Symp Proc. 2017 Feb 10;2016:319-325. eCollection 2016.
4
Common Big Data Challenges and How to Overcome Them.常见的大数据挑战及应对方法。
Big Data. 2014 Sep;2(3):142-3. doi: 10.1089/big.2014.0030. Epub 2014 Aug 12.
5
Making sense of big data in health research: Towards an EU action plan.理解健康研究中的大数据:迈向欧盟行动计划。
Genome Med. 2016 Jun 23;8(1):71. doi: 10.1186/s13073-016-0323-y.
6
PredicT-ML: a tool for automating machine learning model building with big clinical data.PredicT-ML:一个利用大型临床数据自动化机器学习模型构建的工具。
Health Inf Sci Syst. 2016 Jun 8;4:5. doi: 10.1186/s13755-016-0018-1. eCollection 2016.
7
An international observational study suggests that artificial intelligence for clinical decision support optimizes anemia management in hemodialysis patients.一项国际观察性研究表明,临床决策支持的人工智能可优化血液透析患者的贫血管理。
Kidney Int. 2016 Aug;90(2):422-429. doi: 10.1016/j.kint.2016.03.036. Epub 2016 Jun 2.
8
Electronic Health Records: Then, Now, and in the Future.电子健康记录:过去、现在与未来。
Yearb Med Inform. 2016 May 20;Suppl 1(Suppl 1):S48-61. doi: 10.15265/IYS-2016-s006.
9
Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction.自动解释机器学习预测结果:以 2 型糖尿病风险预测为例。
Health Inf Sci Syst. 2016 Mar 8;4:2. doi: 10.1186/s13755-016-0015-4. eCollection 2016.
10
Using Computational Approaches to Improve Risk-Stratified Patient Management: Rationale and Methods.运用计算方法改善风险分层患者管理:基本原理与方法
JMIR Res Protoc. 2015 Oct 26;4(4):e128. doi: 10.2196/resprot.5039.