Suppr超能文献

深度学习自动勾画中的人群偏差风险研究

An investigation into the risk of population bias in deep learning autocontouring.

机构信息

Mirada Medical Ltd, Oxford, United Kingdom.

University of Groningen, University Medical Center Groningen, Department of Radiation Oncology, Groningen, The Netherlands.

出版信息

Radiother Oncol. 2023 Sep;186:109747. doi: 10.1016/j.radonc.2023.109747. Epub 2023 Jun 16.

Abstract

BACKGROUND AND PURPOSE

To date, data used in the development of Deep Learning-based automatic contouring (DLC) algorithms have been largely sourced from single geographic populations. This study aimed to evaluate the risk of population-based bias by determining whether the performance of an autocontouring system is impacted by geographic population.

MATERIALS AND METHODS

80 Head Neck CT deidentified scans were collected from four clinics in Europe (n = 2) and Asia (n = 2). A single observer manually delineated 16 organs-at-risk in each. Subsequently, the data was contoured using a DLC solution, and trained using single institution (European) data. Autocontours were compared to manual delineations using quantitative measures. A Kruskal-Wallis test was used to test for any difference between populations. Clinical acceptability of automatic and manual contours to observers from each participating institution was assessed using a blinded subjective evaluation.

RESULTS

Seven organs showed a significant difference in volume between groups. Four organs showed statistical differences in quantitative similarity measures. The qualitative test showed greater variation in acceptance of contouring between observers than between data from different origins, with greater acceptance by the South Korean observers.

CONCLUSION

Much of the statistical difference in quantitative performance could be explained by the difference in organ volume impacting the contour similarity measures and the small sample size. However, the qualitative assessment suggests that observer perception bias has a greater impact on the apparent clinical acceptability than quantitatively observed differences. This investigation of potential geographic bias should extend to more patients, populations, and anatomical regions in the future.

摘要

背景与目的

迄今为止,深度学习自动勾画(DLC)算法中使用的数据主要来源于单一地理人群。本研究旨在通过确定自动勾画系统的性能是否受到地理人群的影响,来评估基于人群的偏差风险。

材料与方法

从欧洲(n=2)和亚洲(n=2)的 4 个诊所收集了 80 例头颈部 CT 匿名扫描。一位观察者手动勾画了每个病例中的 16 个危及器官。随后,使用 DLC 解决方案对数据进行勾画,并使用单一机构(欧洲)的数据进行训练。使用定量指标比较自动勾画和手动勾画。使用 Kruskal-Wallis 检验检验人群之间是否存在差异。使用盲法主观评估评估来自每个参与机构的观察者对自动和手动勾画的临床可接受性。

结果

有 7 个器官的体积在组间存在显著差异。有 4 个器官的定量相似性指标存在统计学差异。定性测试显示,观察者之间的勾画接受程度比数据来源之间的差异更为多变,韩国观察者的接受程度更高。

结论

定量性能的大部分统计学差异可以用影响轮廓相似性指标的器官体积差异和样本量小来解释。然而,定性评估表明,观察者感知偏差对明显的临床可接受性的影响大于定量观察到的差异。未来应进一步扩大这一关于潜在地理偏差的研究,纳入更多患者、人群和解剖区域。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验