Dreizin David, Zhang Lei, Sarkar Nathan, Bodanapally Uttam K, Li Guang, Hu Jiazhen, Chen Haomin, Khedr Mustafa, Khetan Udit, Campbell Peter, Unberath Mathias
Department of Diagnostic Radiology and Nuclear Medicine, School of Medicine, University of Maryland, Baltimore, MD, United States.
Johns Hopkins University, Baltimore, MD, United States.
Front Radiol. 2023;3. doi: 10.3389/fradi.2023.1202412. Epub 2023 Jul 11.
precision-medicine quantitative tools for cross-sectional imaging require painstaking labeling of targets that vary considerably in volume, prohibiting scaling of data annotation efforts and supervised training to large datasets for robust and generalizable clinical performance. A straight-forward time-saving strategy involves manual editing of AI-generated labels, which we call AI-collaborative labeling (AICL). Factors affecting the efficacy and utility of such an approach are unknown. Reduction in time effort is not well documented. Further, edited AI labels may be prone to automation bias.
In this pilot, using a cohort of CTs with intracavitary hemorrhage, we evaluate both time savings and AICL label quality and propose criteria that must be met for using AICL annotations as a high-throughput, high-quality ground truth.
57 CT scans of patients with traumatic intracavitary hemorrhage were included. No participant recruited for this study had previously interpreted the scans. nnU-net models trained on small existing datasets for each feature (hemothorax/hemoperitoneum/pelvic hematoma; = 77-253) were used in inference. Two common scenarios served as baseline comparison- expert manual labeling, and expert edits of trained staff labels. Parameters included time effort and image quality graded by a blinded independent expert using a 9-point scale. The observer also attempted to discriminate AICL and expert labels in a random subset ( = 18). Data were compared with ANOVA and post-hoc paired signed rank tests with Bonferroni correction.
AICL reduced time effort 2.8-fold compared to staff label editing, and 8.7-fold compared to expert labeling (corrected < 0.0006). Mean Likert grades for AICL (8.4, SD:0.6) were significantly higher than for expert labels (7.8, SD:0.9) and edited staff labels (7.7, SD:0.8) (corrected < 0.0006). The independent observer failed to correctly discriminate AI and human labels.
For our use case and annotators, AICL facilitates rapid large-scale curation of high-quality ground truth. The proposed quality control regime can be employed by other investigators prior to embarking on AICL for segmentation tasks in large datasets.
用于横断面成像的精准医学定量工具需要对体积差异很大的目标进行费力的标注,这使得数据标注工作和针对大型数据集的监督训练难以扩展,从而无法实现强大且可推广的临床性能。一种简单省时的策略是人工编辑人工智能生成的标签,我们将其称为人工智能协作标注(AICL)。影响这种方法有效性和实用性的因素尚不清楚。时间节省情况也没有得到充分记录。此外,编辑后的人工智能标签可能容易出现自动化偏差。
在这项试点研究中,我们使用一组患有腔内出血的CT图像,评估AICL节省的时间以及标签质量,并提出将AICL标注用作高通量、高质量真实数据必须满足的标准。
纳入了57例创伤性腔内出血患者的CT扫描图像。参与本研究的所有受试者此前均未解读过这些扫描图像。使用在每个特征(血胸/血腹/盆腔血肿;n = 77 - 253)的现有小数据集上训练的nnU-net模型进行推理。以两种常见情况作为基线对照——专家手动标注,以及对经过训练的工作人员的标注进行专家编辑。参数包括时间消耗以及由一位不知情的独立专家使用9分制对图像质量进行的评分。该观察者还试图在一个随机子集中(n = 18)区分AICL标签和专家标签。数据通过方差分析以及带有Bonferroni校正的事后配对符号秩检验进行比较。
与工作人员编辑标签相比,AICL将时间消耗减少了2.8倍,与专家标注相比减少了8.7倍(校正P < 0.0006)。AICL的平均李克特评分(满分9分)为8.4(标准差:0.6),显著高于专家标签(7.8,标准差:0.9)和编辑后的工作人员标签(7.7,标准差:0.8)(校正P < 0.0006)。独立观察者未能正确区分人工智能生成的标签和人工标注的标签。
对于我们的用例和标注人员而言,AICL有助于快速大规模整理高质量的真实数据。其他研究人员在对大型数据集进行分割任务而采用AICL之前,可以采用所提出的质量控制方案。