Obuchowski Nancy A, Bullen Jennifer A
Quantitative Health Sciences /JJN3, Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH, 44195, USA.
Contemp Clin Trials Commun. 2019 Aug 22;16:100434. doi: 10.1016/j.conctc.2019.100434. eCollection 2019 Dec.
Artificial intelligence, as applied to medical images to detect, rule out, diagnose, and stage disease, has seen enormous growth over the last few years. There are multiple use cases of AI algorithms in medical imaging: first-reader (or concurrent) mode, second-reader mode, triage mode, and more recently prescreening mode as when an AI algorithm is applied to the worklist of images to identify obvious negative cases so that human readers do not need to review them and can focus on interpreting the remaining cases. In this paper we describe the statistical considerations for designing a study to test a new AI prescreening algorithm for identifying normal lung cancer screening CTs. We contrast agreement vs. accuracy studies, and retrospective vs. prospective designs. We evaluate various test performance metrics with respect to their sensitivity to changes in the AI algorithm's performance, as well as to shifts in reader behavior to a revised worklist. We consider sample size requirements for testing the AI prescreening algorithm.
在过去几年中,应用于医学图像以检测、排除、诊断疾病及确定疾病分期的人工智能技术取得了巨大发展。人工智能算法在医学成像中有多种应用场景:初读(或同步)模式、复阅模式、分诊模式,以及最近出现的预筛查模式,即当将人工智能算法应用于图像工作列表以识别明显的阴性病例时,人类阅片者无需查看这些病例,而是可以专注于解读其余病例。在本文中,我们描述了设计一项研究的统计学考量,该研究旨在测试一种用于识别正常肺癌筛查CT的新型人工智能预筛查算法。我们对比了一致性研究与准确性研究,以及回顾性设计与前瞻性设计。我们评估了各种测试性能指标,考量它们对人工智能算法性能变化的敏感性,以及对阅片者行为向修订后的工作列表转变的敏感性。我们还考虑了测试人工智能预筛查算法所需的样本量。