Suppr超能文献

SCSFish2025:一个来自中国南海的用于珊瑚礁鱼类识别的大型数据集。

SCSFish2025: a large dataset from South China sea for coral reef fish identification.

作者信息

Wang Meng, Xiao Wei, Wang Ying, Jia Houlei, Gao Yang, Chen Zhiguang, Zheng Fudan

机构信息

Nansha Islands Coral Reef Ecosystem National Observation and Research Station, Ministry of Natural Resources, No. 155 Xingangxi Road, Guangzhou, 510300, China.

South China Sea Ecological Center, Ministry of Natural Resources, No. 155 Xingangxi Road, Guangzhou, 510300, China.

出版信息

Sci Rep. 2025 Aug 17;15(1):30091. doi: 10.1038/s41598-025-14785-4.

Abstract

Coral reefs are one of the most biodiverse ecosystems on Earth and are extremely important for marine ecosystems. However, coral reefs are rapidly degrading globally, and for this reason, in-situ online monitoring systems are being used to monitor coral reef ecosystems in real time. At the same time, artificial intelligence technology, particularly deep learning technology, is playing an increasingly important role in the study of coral reef ecology, especially in the automatic detection and identification of coral reef fish. However, deep learning is essentially a data-driven technique that relies on high-quality datasets for training, while existing fish identification datasets suffer from low resolution and inaccurate labeling, which limits the application of deep learning techniques to coral reef fish identification. To better utilize deep learning techniques for real-time automatic detection and identification of coral reef fish from the data collected by the in-situ online monitoring system, this paper proposes a high-resolution, fish species-rich, and well-labeled coral reef fish dataset SCSFish2025, which is the first publicly available coral reef fish dataset in the waters of China's Nansha Islands. SCSFish2025 contains 11,956 high-resolution underwater surveillance images and over 120,000 bounding boxes covering 30 species of fish that have been manually labelled by experienced fish identification experts, with sub-category labels for blurring, occlusion, and altered pose. Furthermore, this paper establishes a benchmark for the dataset by analyzing the detection performance of deep learning object detection techniques on this dataset using four state-of-the-art or typical object detection models as baseline models. The best baseline model RT-DETRv2 achieves mAP@50 performance of 0.9960 and 0.7486 respectively on the five-fold cross-validation of the training set and the independent test set. The release of this dataset will help promote the development of AI technology in the study of automatic detection and identification of coral reef fish, and provide strong support for the study of marine biodiversity and ecosystems. The project code and dataset are available at https://github.com/FudanZhengSYSU/SCSFish2025 .

摘要

珊瑚礁是地球上生物多样性最丰富的生态系统之一,对海洋生态系统极为重要。然而,全球范围内珊瑚礁正在迅速退化,因此,正在使用原位在线监测系统对珊瑚礁生态系统进行实时监测。与此同时,人工智能技术,特别是深度学习技术,在珊瑚礁生态学研究中发挥着越来越重要的作用,尤其是在珊瑚礁鱼类的自动检测和识别方面。然而,深度学习本质上是一种数据驱动的技术,依赖于高质量的数据集进行训练,而现有的鱼类识别数据集存在分辨率低和标注不准确的问题,这限制了深度学习技术在珊瑚礁鱼类识别中的应用。为了更好地利用深度学习技术,从原位在线监测系统收集的数据中实时自动检测和识别珊瑚礁鱼类,本文提出了一个高分辨率、鱼类物种丰富且标注良好的珊瑚礁鱼类数据集SCSFish2025,这是中国南沙群岛海域首个公开可用的珊瑚礁鱼类数据集。SCSFish2025包含11956张高分辨率水下监测图像和超过120000个边界框,覆盖30种鱼类,这些图像由经验丰富的鱼类识别专家手动标注,并带有模糊、遮挡和姿态改变的子类别标签。此外,本文通过使用四个最先进或典型的目标检测模型作为基线模型,分析深度学习目标检测技术在该数据集上的检测性能,为该数据集建立了一个基准。最佳基线模型RT-DETRv2在训练集的五折交叉验证和独立测试集上分别实现了0.9960和0.7486的mAP@50性能。该数据集的发布将有助于推动人工智能技术在珊瑚礁鱼类自动检测和识别研究中的发展,并为海洋生物多样性和生态系统研究提供有力支持。项目代码和数据集可在https://github.com/FudanZhengSYSU/SCSFish2025获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee7d/12358537/95021e19a70d/41598_2025_14785_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验