Suppr超能文献

对于自动三维特征提取而言,多少个样本能构成一个足够的训练集?

How many specimens make a sufficient training set for automated three-dimensional feature extraction?

作者信息

Mulqueeney James M, Searle-Barnes Alex, Brombacher Anieke, Sweeney Marisa, Goswami Anjali, Ezard Thomas H G

机构信息

School of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton Waterfront Campus, Southampton, UK.

Department of Life Sciences, Natural History Museum, London, UK.

出版信息

R Soc Open Sci. 2024 Jun 19;11(6). doi: 10.1098/rsos.240113. eCollection 2024 Jun.

Abstract

Deep learning has emerged as a robust tool for automating feature extraction from three-dimensional images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artficial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared with the external structure, owing to low contrast differences between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from three-dimensional images.

摘要

深度学习已成为一种强大的工具,可用于从三维图像中自动提取特征,为劳动密集型且可能存在偏差的手动图像分割方法提供了一种高效的替代方案。然而,对于最佳训练集大小的探索有限,包括评估通过数据增强进行人工扩展是否能在更短时间内取得一致的结果,以及这些益处对于不同类型特征的一致性如何。在本研究中,我们手动分割了来自该属的50个浮游有孔虫标本,以确定从内部和外部结构生成准确的体积和形状数据所需的最少训练图像数量。结果不出所料地表明,深度学习模型随着训练图像数量的增加而改进,需要八个标本才能达到95%的准确率。此外,数据增强可将网络准确率提高多达8.0%。值得注意的是,与外部结构相比,预测内部结构的体积和形状测量带来了更大的挑战,这是由于不同材料之间的对比度差异较低以及几何复杂性增加。这些结果为不同特征的精确图像分割的最佳训练集大小提供了新的见解,并突出了数据增强在增强从三维图像中进行多变量特征提取方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156e/11296157/f0d3ea050597/rsos.240113.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验