Luo Gang, Stone Bryan L, Johnson Michael D, Tarczy-Hornoch Peter, Wilcox Adam B, Mooney Sean D, Sheng Xiaoming, Haug Peter J, Nkoy Flory L
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.
Department of Pediatrics, University of Utah, Salt Lake City, UT, United States.
JMIR Res Protoc. 2017 Aug 29;6(8):e175. doi: 10.2196/resprot.7757.
To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets.
This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data.
This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers; and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes.
We are currently writing Auto-ML's design document. We intend to finish our study by around the year 2022.
Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, health care researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in health care and improve patient outcomes.
为了改善健康状况并降低医疗成本,我们常常需要使用大型临床数据集(即临床大数据)进行预测/分类,例如识别需要预防性干预的高风险患者。机器学习已被视为实现这一目标的关键技术。机器学习在大多数数据科学竞赛中获胜,并能支持许多临床活动,但仅有15%的医院将其用于哪怕是有限的目的。尽管医疗保健研究人员熟悉数据,但他们往往缺乏直接使用临床大数据的机器学习专业知识,这在从数据中实现价值方面构成了障碍。医疗保健研究人员可以与拥有深厚机器学习知识的数据科学家合作,但双方有效沟通需要花费时间和精力。由于美国数据科学家短缺,且面临资金雄厚的公司的招聘竞争,医疗保健系统在招募数据科学家方面存在困难。构建和推广机器学习模型通常需要数据科学家进行数百到数千次手动迭代,以选择以下内容:(1)对模型准确性有重大影响的超参数值和复杂算法,以及(2)用于临时聚合临床属性的运算符和时间段(例如,患者体重在过去一年是否持续上升)。在预算有限的情况下,这个过程变得不可行。
本研究的目标是使医疗保健研究人员能够直接使用临床大数据,在预算和数据科学家资源有限的情况下使机器学习可行,并从数据中实现价值。
本研究将使我们能够实现以下目标:(1)完成开发新软件“自动化机器学习(Auto-ML)”,以实现对临床大数据进行机器学习的模型选择自动化,并在七个具有临床重要性的基准建模问题上验证Auto-ML;(2)将Auto-ML和新方法应用于两个对护理管理分配至关重要的新建模问题,并与护理经理试用一个模型;(3)进行模拟,以估计采用Auto-ML对美国患者结局的影响。
我们目前正在撰写Auto-ML的设计文档。我们打算在2022年左右完成本研究。
Auto-ML将适用于各种临床预测/分类问题。在数据科学家的最少帮助下,医疗保健研究人员可以使用Auto-ML快速构建高质量模型。这将推动机器学习在医疗保健领域的更广泛应用,并改善患者结局。