Suppr超能文献

数据聚类,以选择具有临床相关性的测试用例,用于算法基准测试和特征描述。

Data clustering to select clinically-relevant test cases for algorithm benchmarking and characterization.

机构信息

Department of Physics and Astronomy, University of Calgary, 2500 University Dr NW, Calgary, Alberta, T2N 1N4, Canada. Department of Medical Physics, Tom Baker Cancer Centre, 1331 29 St NW, Calgary, Alberta, T2N 4N2, Canada. Author to whom any correspondence should be addressed.

出版信息

Phys Med Biol. 2020 Mar 6;65(5):055014. doi: 10.1088/1361-6560/ab6e54.

Abstract

Algorithm benchmarking and characterization are an important part of algorithm development and validation prior to clinical implementation. However, benchmarking may be limited to a small collection of test cases due to the resource-intensive nature of establishing 'ground-truth' references. This study proposes a framework for selecting test cases to assess algorithm and workflow equivalence. Effective test case selection may minimize the number of ground-truth comparisons required to establish robust and clinically relevant benchmarking and characterization results. To demonstrate the proposed framework, we clustered differences between two independent workflows estimating during-treatment dose objective violations for 15 head and neck cancer patients (15 planning CTs, 105 on-unit CBCTs). Each workflow used a different deformable image registration algorithm to estimate inter-fractional anatomy and contour changes. The Hopkins statistic tested whether workflow output was inherently clustered and k-medoid clustering formalized cluster assignment. Further statistical analyses verified the relevance of clusters to algorithm output. Data at cluster centers ('medoids') were considered as candidate test cases representative of workflow-relevant algorithm differences. The framework indicated that differences in estimated dose objective violations were naturally grouped (Hopkins  =  0.75, providing 90% confidence). K-medoid clustering identified five clusters which stratified workflow differences (MANOVA: p   <  0.001) in estimated parotid gland D50%, spinal cord/brainstem Dmax, and high dose CTV coverage dose violations (Kendall's tau: p   <  0.05). Systematic algorithm differences resulting in workflow discrepancies were: parotid gland volumes (ANOVA: p   <  0.001), external contour deformations (t-test: p   =  0.022), and CTV-to-PTV margins (t-test: 0.009), respectively. Five candidate test cases were verified as representative of the five clusters. The framework successfully clustered workflow outputs and identified five test cases representative of clinically relevant algorithm discrepancies. This approach may improve the allocation of resources during the benchmarking and characterization process and the applicability of results to clinical data.

摘要

算法基准测试和特征描述是在将算法应用于临床之前进行开发和验证的重要部分。然而,由于建立“真实基准”参考的资源密集性质,基准测试可能仅限于一小部分测试用例。本研究提出了一种选择测试用例来评估算法和工作流程等效性的框架。有效的测试用例选择可以最大限度地减少建立稳健且与临床相关的基准测试和特征描述结果所需的真实基准比较数量。为了演示所提出的框架,我们对两种独立的工作流程进行了聚类,这两种工作流程用于估计 15 例头颈部癌症患者(15 次计划 CT、105 次在治疗期间的 CBCT)的治疗期间剂量目标违反情况。每个工作流程都使用不同的可变形图像配准算法来估计分次间解剖结构和轮廓变化。Hopkins 统计检验了工作流程输出是否固有地聚类,k-中值聚类正式确定了聚类分配。进一步的统计分析验证了聚类与算法输出的相关性。在聚类中心(“中值”)的数据被视为代表工作流程相关算法差异的候选测试用例。该框架表明,估计的剂量目标违反差异自然分组(Hopkins  =  0.75,置信度为 90%)。k-中值聚类确定了五个聚类,这些聚类分层了工作流程差异(MANOVA:p   <  0.001),包括估计的腮腺 D50%、脊髓/脑干 Dmax 和高剂量 CTV 覆盖剂量违反情况(Kendall's tau:p   <  0.05)。导致工作流程差异的系统算法差异是:腮腺体积(ANOVA:p   <  0.001)、外部轮廓变形(t 检验:p   =  0.022)和 CTV-到-PTV 边界(t 检验:0.009)。五个候选测试用例被验证为五个聚类的代表。该框架成功地对工作流程输出进行了聚类,并确定了五个代表临床相关算法差异的测试用例。这种方法可以提高基准测试和特征描述过程中的资源分配效率,并提高结果在临床数据中的适用性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验