Cui Chen, Chou Shinn-Huey S, Brattain Laura, Lehman Constance D, Samir Anthony E
Center for Ultrasound Research & Translation, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, MA 02114.
Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA.
AJR Am J Roentgenol. 2019 Jul;213(1):216-226. doi: 10.2214/AJR.18.20464. Epub 2019 Feb 19.
Data engineering is the foundation of effective machine learning model development and research. The accuracy and clinical utility of machine learning models fundamentally depend on the quality of the data used for model development. This article aims to provide radiologists and radiology researchers with an understanding of the core elements of data preparation for machine learning research. We cover key concepts from an engineering perspective, including databases, data integrity, and characteristics of data suitable for machine learning projects, and from a clinical perspective, including the HIPAA, patient consent, avoidance of bias, and ethical concerns related to the potential to magnify health disparities. The focus of this article is women's imaging; nonetheless, the principles described apply to all domains of medical imaging. Machine learning research is inherently interdisciplinary: effective collaboration is critical for success. In medical imaging, radiologists possess knowledge essential for data engineers to develop useful datasets for machine learning model development.
数据工程是有效开展机器学习模型开发与研究的基础。机器学习模型的准确性和临床实用性从根本上取决于用于模型开发的数据质量。本文旨在让放射科医生和放射学研究人员了解机器学习研究数据准备的核心要素。我们从工程学角度涵盖关键概念,包括数据库、数据完整性以及适用于机器学习项目的数据特征,从临床角度涵盖《健康保险流通与责任法案》(HIPAA)、患者同意、避免偏差以及与放大健康差距可能性相关的伦理问题。本文重点关注女性成像;尽管如此,所描述的原则适用于医学成像的所有领域。机器学习研究本质上是跨学科的:有效的协作对于成功至关重要。在医学成像领域,放射科医生拥有的数据工程师为机器学习模型开发构建有用数据集所必需的知识。