Department of Radiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania.
Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts.
J Am Coll Radiol. 2023 Sep;20(9):836-841. doi: 10.1016/j.jacr.2023.06.015. Epub 2023 Jul 16.
Artificial intelligence (AI) continues to show great potential in disease detection and diagnosis on medical imaging with increasingly high accuracy. An important component of AI model creation is dataset development for training, validation, and testing. Diverse and high-quality datasets are critical to ensure robust and unbiased AI models that maintain validity, especially in traditionally underserved populations globally. Yet publicly available datasets demonstrate problems with quality and inclusivity. In this literature review, the authors evaluate publicly available medical imaging datasets for demographic, geographic, genetic, and disease representation or lack thereof and call for an increase emphasis on dataset development to maximize the impact of AI models.
人工智能(AI)在医学影像疾病检测和诊断方面的准确性不断提高,显示出巨大的潜力。AI 模型创建的一个重要组成部分是用于训练、验证和测试的数据集开发。多样化和高质量的数据集对于确保稳健和无偏的 AI 模型至关重要,尤其是在全球传统上服务不足的人群中。然而,现有的公开数据集在质量和包容性方面存在问题。在这篇文献综述中,作者评估了现有的医学成像数据集在人口统计学、地理位置、遗传学和疾病方面的代表性或缺乏代表性,并呼吁更加重视数据集的开发,以最大限度地提高 AI 模型的影响力。