Chambers David A, Amir Eitan, Saleh Ramy R, Rodin Danielle, Keating Nancy L, Osterman Travis J, Chen James L
1 Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Rockville, MD.
2 Division of Medical Oncology and Hematology, Princess Margaret Cancer Centre and Department of Medicine, University of Toronto, Toronto, Ontario, Canada.
Am Soc Clin Oncol Educ Book. 2019 Jan;39:e167-e175. doi: 10.1200/EDBK_238057. Epub 2019 May 17.
The concept of "big data" research-the aggregation and analysis of biologic, clinical, administrative, and other data sources to drive new advances in biomedical knowledge-has been embraced by the cancer research enterprise. Although much of the conversation has concentrated on the amalgamation of basic biologic data (e.g., genomics, metabolomics, tumor tissue), new opportunities to extend potential contributions of big data to clinical practice and policy abound. This article examines these opportunities through discussion of three major data sources: aggregated clinical trial data, administrative data (including insurance claims data), and data from electronic health records. We will discuss the benefits of data use to answer key oncology practice and policy research questions, along with limitations inherent in these complex data sources. Finally, the article will discuss overarching themes across data types and offer next steps for the research, practice, and policy communities. The use of multiple sources of big data has the promise of improving knowledge and providing more accurate data for clinicians and policy decision makers. In the future, optimization of machine learning may allow for current limitations of big data analyses to be attenuated, thereby resulting in improved patient care and outcomes.
“大数据”研究的概念——整合与分析生物、临床、管理及其他数据源以推动生物医学知识取得新进展——已被癌症研究领域所接受。尽管大部分讨论都集中在基础生物数据(如基因组学、代谢组学、肿瘤组织)的融合上,但将大数据的潜在贡献扩展到临床实践和政策制定方面仍存在大量新机遇。本文通过讨论三大主要数据源来审视这些机遇:汇总的临床试验数据、管理数据(包括保险理赔数据)以及电子健康记录数据。我们将探讨使用这些数据来回答肿瘤学关键实践和政策研究问题的益处,以及这些复杂数据源所固有的局限性。最后,本文将讨论各类数据的总体主题,并为研究、实践和政策领域提供后续步骤。使用多种大数据源有望增进知识,并为临床医生和政策决策者提供更准确的数据。未来,机器学习的优化或许能减轻当前大数据分析的局限性,从而改善患者护理及治疗结果。