Rozhyna Anastasiia, Somfai Gábor Márk, Atzori Manfredo, DeBuc Delia Cabrera, Saad Amr, Zoellin Jay, Müller Henning
Informatics Institute, University of Applied Sciences Western Switzerland (HES-SO), 3960 Sierre, Switzerland.
Medical Informatics, University of Geneva, 1205 Geneva, Switzerland.
Diagnostics (Basel). 2024 Aug 1;14(15):1668. doi: 10.3390/diagnostics14151668.
Artificial intelligence has transformed medical diagnostic capabilities, particularly through medical image analysis. AI algorithms perform well in detecting abnormalities with a strong performance, enabling computer-aided diagnosis by analyzing the extensive amounts of patient data. The data serve as a foundation upon which algorithms learn and make predictions. Thus, the importance of data cannot be underestimated, and clinically corresponding datasets are required. Many researchers face a lack of medical data due to limited access, privacy concerns, or the absence of available annotations. One of the most widely used diagnostic tools in ophthalmology is Optical Coherence Tomography (OCT). Addressing the data availability issue is crucial for enhancing AI applications in the field of OCT diagnostics. This review aims to provide a comprehensive analysis of all publicly accessible retinal OCT datasets. Our main objective is to compile a list of OCT datasets and their properties, which can serve as an accessible reference, facilitating data curation for medical image analysis tasks. For this review, we searched through the Zenodo repository, Mendeley Data repository, MEDLINE database, and Google Dataset search engine. We systematically evaluated all the identified datasets and found 23 open-access datasets containing OCT images, which significantly vary in terms of size, scope, and ground-truth labels. Our findings indicate the need for improvement in data-sharing practices and standardized documentation. Enhancing the availability and quality of OCT datasets will support the development of AI algorithms and ultimately improve diagnostic capabilities in ophthalmology. By providing a comprehensive list of accessible OCT datasets, this review aims to facilitate better utilization and development of AI in medical image analysis.
人工智能已经改变了医学诊断能力,尤其是通过医学图像分析。人工智能算法在检测异常方面表现出色,能够通过分析大量患者数据实现计算机辅助诊断。这些数据是算法学习和进行预测的基础。因此,数据的重要性不可低估,需要有临床相应的数据集。由于获取受限、隐私问题或缺乏可用注释,许多研究人员面临医学数据短缺的问题。眼科领域最广泛使用的诊断工具之一是光学相干断层扫描(OCT)。解决数据可用性问题对于增强OCT诊断领域的人工智能应用至关重要。本综述旨在对所有可公开获取的视网膜OCT数据集进行全面分析。我们的主要目标是编制一份OCT数据集及其属性的列表,作为一个可获取的参考资料,便于医学图像分析任务的数据管理。在本次综述中,我们搜索了Zenodo存储库、Mendeley数据存储库、MEDLINE数据库和谷歌数据集搜索引擎。我们系统地评估了所有识别出的数据集,发现了23个包含OCT图像的开放获取数据集,这些数据集在大小、范围和地面真值标签方面有很大差异。我们的研究结果表明,在数据共享实践和标准化文档方面需要改进。提高OCT数据集的可用性和质量将支持人工智能算法的开发,并最终提高眼科的诊断能力。通过提供一份可获取的OCT数据集综合列表,本综述旨在促进医学图像分析中人工智能的更好利用和发展。