Raita Yoshihiko, Camargo Carlos A, Liang Liming, Hasegawa Kohei
Department of Emergency Medicine, Harvard Medical School, Massachusetts General Hospital, Boston, MA, United States.
Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Harvard Medical School, Massachusetts General Hospital, Boston, MA, United States.
Front Med (Lausanne). 2021 Jul 6;8:678047. doi: 10.3389/fmed.2021.678047. eCollection 2021.
Clinicians handle a growing amount of clinical, biometric, and biomarker data. In this "big data" era, there is an emerging faith that the answer to all clinical and scientific questions reside in "big data" and that data will transform medicine into precision medicine. However, data by themselves are useless. It is the algorithms encoding causal reasoning and domain (e.g., clinical and biological) knowledge that prove transformative. The recent introduction of (health) data science presents an opportunity to re-think this data-centric view. For example, while precision medicine seeks to provide the right prevention and treatment strategy to the right patients at the right time, its realization cannot be achieved by algorithms that operate exclusively in data-driven prediction modes, as do most machine learning algorithms. Better understanding of data science and its tasks is vital to interpret findings and translate new discoveries into clinical practice. In this review, we first discuss the principles and major tasks of data science by organizing it into three defining tasks: (1) association and prediction, (2) intervention, and (3) counterfactual causal inference. Second, we review commonly-used data science tools with examples in the medical literature. Lastly, we outline current challenges and future directions in the fields of medicine, elaborating on how data science can enhance clinical effectiveness and inform medical practice. As machine learning algorithms become ubiquitous tools to handle quantitatively "big data," their integration with causal reasoning and domain knowledge is instrumental to qualitatively transform medicine, which will, in turn, improve health outcomes of patients.
临床医生处理着越来越多的临床、生物特征和生物标志物数据。在这个“大数据”时代,人们逐渐相信,所有临床和科学问题的答案都存在于“大数据”中,并且数据将把医学转变为精准医学。然而,数据本身是无用的。是编码因果推理和领域(如临床和生物学)知识的算法才具有变革性。(健康)数据科学的最新引入为重新思考这种以数据为中心的观点提供了契机。例如,虽然精准医学旨在在正确的时间为正确的患者提供正确的预防和治疗策略,但它的实现无法通过大多数机器学习算法那样仅以数据驱动的预测模式运行的算法来达成。更好地理解数据科学及其任务对于解释研究结果并将新发现转化为临床实践至关重要。在本综述中,我们首先通过将数据科学组织为三个定义性任务来讨论其原理和主要任务:(1)关联与预测,(2)干预,以及(3)反事实因果推断。其次,我们结合医学文献中的实例回顾常用的数据科学工具。最后,我们概述医学领域当前的挑战和未来方向,阐述数据科学如何提高临床疗效并为医疗实践提供信息。随着机器学习算法成为处理定量“大数据”的普遍工具,它们与因果推理和领域知识的整合对于定性地变革医学至关重要,而这反过来又将改善患者的健康状况。