医学诊断和预测人工智能技术临床效能评估的方法学指南

Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction.

机构信息

From the Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea (S.H.P.); and Department of Radiology, Research Institute of Radiological Science, Yonsei University College of Medicine, Seoul, South Korea (K.H.).

出版信息

Radiology. 2018 Mar;286(3):800-809. doi: 10.1148/radiol.2017171920. Epub 2018 Jan 8.

Abstract

The use of artificial intelligence in medicine is currently an issue of great interest, especially with regard to the diagnostic or predictive analysis of medical images. Adoption of an artificial intelligence tool in clinical practice requires careful confirmation of its clinical utility. Herein, the authors explain key methodology points involved in a clinical evaluation of artificial intelligence technology for use in medicine, especially high-dimensional or overparameterized diagnostic or predictive models in which artificial deep neural networks are used, mainly from the standpoints of clinical epidemiology and biostatistics. First, statistical methods for assessing the discrimination and calibration performances of a diagnostic or predictive model are summarized. Next, the effects of disease manifestation spectrum and disease prevalence on the performance results are explained, followed by a discussion of the difference between evaluating the performance with use of internal and external datasets, the importance of using an adequate external dataset obtained from a well-defined clinical cohort to avoid overestimating the clinical performance as a result of overfitting in high-dimensional or overparameterized classification model and spectrum bias, and the essentials for achieving a more robust clinical evaluation. Finally, the authors review the role of clinical trials and observational outcome studies for ultimate clinical verification of diagnostic or predictive artificial intelligence tools through patient outcomes, beyond performance metrics, and how to design such studies. RSNA, 2018.

摘要

人工智能在医学中的应用目前是一个备受关注的问题,特别是在医学图像的诊断或预测分析方面。在临床实践中采用人工智能工具需要仔细确认其临床实用性。在此,作者从临床流行病学和生物统计学的角度,解释了用于医学的人工智能技术临床评估中涉及的一些关键方法学要点,特别是在使用人工深度神经网络的高维或超参数化诊断或预测模型的情况下。首先,总结了评估诊断或预测模型的区分度和校准性能的统计方法。其次,解释了疾病表现谱和疾病流行率对性能结果的影响,接着讨论了使用内部数据集和外部数据集评估性能的区别、使用从明确界定的临床队列中获得的充分外部数据集的重要性,以避免因高维或超参数化分类模型中的过拟合和谱偏差而高估临床性能,并介绍了实现更稳健的临床评估的要点。最后,作者回顾了临床试验和观察性结果研究在通过患者结局对诊断或预测人工智能工具进行最终临床验证方面的作用,超越了性能指标,并介绍了如何设计此类研究。RSNA,2018 年。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索