Suppr超能文献

在报告深度学习算法在提供医学影像诊断性能的研究中,“验证”一词的使用不一致。

Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging.

机构信息

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Department of Radiology, National Cancer Center, Goyang, Republic of Korea.

出版信息

PLoS One. 2020 Sep 11;15(9):e0238908. doi: 10.1371/journal.pone.0238908. eCollection 2020.

Abstract

BACKGROUND

The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging.

METHODS AND FINDINGS

We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage.

CONCLUSIONS

Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.

摘要

背景

深度学习(DL)算法的开发是一个三步骤的过程——训练、调优和测试。研究中对“验证”一词的使用并不一致,有些研究用它来表示调优,而有些则表示测试,这阻碍了信息的准确传递,并且可能会无意中夸大 DL 算法的性能。我们调查了在使用 DL 算法从医学成像中提供诊断的准确性的研究中,“验证”一词的使用不一致程度。

方法和发现

我们分析了最近两项系统评价中引用的研究论文的全文。根据术语“验证”是单独用于调优、调优和测试两者,还是仅用于测试,将论文进行分类。我们使用广义估计方程的多变量逻辑回归分析,分析了论文特征(即期刊类别、研究领域、印刷出版年份、期刊影响因子[JIF]和测试数据的性质)与术语使用之间的关系。在发表于 125 种期刊的 201 篇论文中,分别有 118(58.7%)、9(4.5%)和 74(36.8%)篇论文将该术语分别用于单独调优、调优和测试两者以及仅测试。我们注意到,JIF 较高与使用术语表示测试(即单独测试或调优和测试两者)而不是单独调优之间存在弱关联(与 JIF<5 相比:JIF 5 至 10:调整后的优势比 2.11,P=0.042;JIF>10:调整后的优势比 2.41,P=0.089)。期刊类别、研究领域、印刷出版年份和测试数据的性质与术语使用无显著关联。

结论

现有文献在引用 DL 算法开发步骤时,在使用“验证”一词方面存在很大程度的不一致。需要努力提高术语使用的准确性和清晰度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfa/7485764/e463ab07e2f8/pone.0238908.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验