Mimura Kazuhide, Itaki Takuya, Kataoka Hirokatsu, Miyakawa Ayumu
Geological Survey of Japan, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8567, Japan.
Estuary Research Center, Shimane University, 1060 Nishikawatu-cho, Matsue, Shimane, 690-8504, Japan.
Sci Rep. 2025 Mar 6;15(1):7189. doi: 10.1038/s41598-025-90988-z.
While deep learning techniques, especially image classification using deep learning, continue to evolve, it has been noted that there is a large time gap in applying these techniques in geological studies. Recently, a new architecture called the vision transformer (ViT), which is an alternative to convolutional neural networks (CNN), has attracted considerable attention. In addition, it has been proposed that the pre-training of classification models using mathematically generated images instead of real images, called formula-driven supervised learning (FDSL), achieves a comparative or even higher performance in visual understanding. In this study, we applied these new techniques to the classification of microfossils (radiolarians). Compared with a previous CNN model, the ViT-based model achieved 6-8% higher average precision. On average, the precision of the FDSL pre-trained models was slightly higher than that of the models pre-trained on real images. Therefore, we propose that these techniques may be suitable for image classification in geological tasks.
虽然深度学习技术,尤其是使用深度学习的图像分类技术不断发展,但人们注意到在地质研究中应用这些技术存在较大的时间差距。最近,一种名为视觉Transformer(ViT)的新架构作为卷积神经网络(CNN)的替代方案,引起了相当大的关注。此外,有人提出使用数学生成的图像而非真实图像对分类模型进行预训练,即公式驱动的监督学习(FDSL),在视觉理解方面能达到相当甚至更高的性能。在本研究中,我们将这些新技术应用于微化石(放射虫)的分类。与之前的CNN模型相比,基于ViT的模型平均精度提高了6 - 8%。平均而言,FDSL预训练模型的精度略高于在真实图像上预训练的模型。因此,我们认为这些技术可能适用于地质任务中的图像分类。