Lee Jung-Oh, Zhou Hong-Yu, Berzin Tyler M, Sodickson Daniel K, Rajpurkar Pranav
Department of Radiology, Seoul National University Hospital, Seoul, Republic of Korea.
Department of Biomedical Informatics, Harvard Medical School, Boston, USA.
NPJ Digit Med. 2025 May 13;8(1):273. doi: 10.1038/s41746-025-01649-4.
This perspective proposes adapting video-text generative AI to 3D medical imaging (CT/MRI) and medical videos (endoscopy/laparoscopy) by treating 3D images as videos. The approach leverages modern video models to analyze multiple sequences simultaneously and provide real-time AI assistance during procedures. The paper examines medical imaging's unique characteristics (synergistic information, metadata, and world model), outlines applications in automated reporting, case retrieval, and education, and addresses challenges of limited datasets, benchmarks, and specialized training.
这一观点提议,通过将3D图像视为视频,使视频-文本生成式人工智能适用于3D医学成像(CT/磁共振成像)和医学视频(内窥镜检查/腹腔镜检查)。该方法利用现代视频模型同时分析多个序列,并在手术过程中提供实时人工智能辅助。本文研究了医学成像的独特特征(协同信息、元数据和世界模型),概述了在自动报告、病例检索和教育方面的应用,并探讨了数据集有限、基准测试和专业培训等挑战。