用于零样本古钱币分类的暹罗变压器网络。

A Siamese Transformer Network for Zero-Shot Ancient Coin Classification.

作者信息

Guo Zhongliang, Arandjelović Ognjen, Reid David, Lei Yaxiong, Büttner Jochen

机构信息

School of Computer Science, University of St Andrews, Scotland KY16 9AJ, UK.

Max Planck Institute for the History of Science, Boltzmannstraße 22, 14195 Berlin, Germany

出版信息

J Imaging. 2023 May 25;9(6):107. doi: 10.3390/jimaging9060107.

DOI:10.3390/jimaging9060107

PMID:37367455

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10299244/

Abstract

Ancient numismatics, the study of ancient coins, has in recent years become an attractive domain for the application of computer vision and machine learning. Though rich in research problems, the predominant focus in this area to date has been on the task of attributing a coin from an image, that is of identifying its issue. This may be considered the cardinal problem in the field and it continues to challenge automatic methods. In the present paper, we address a number of limitations of previous work. Firstly, the existing methods approach the problem as a classification task. As such, they are unable to deal with classes with no or few exemplars (which would be most, given over 50,000 issues of Roman Imperial coins alone), and require retraining when exemplars of a new class become available. Hence, rather than seeking to learn a representation that distinguishes a class from all the others, herein we seek a representation that is best at distinguishing classes from one another, thus relinquishing the demand for exemplars of class. This leads to our adoption of the paradigm of pairwise coin matching by issue, rather than the usual classification paradigm, and the specific solution we propose in the form of a Siamese neural network. Furthermore, while adopting deep learning, motivated by its successes in the field and its unchallenged superiority over classical computer vision approaches, we also seek to leverage the advantages that transformers have over the previously employed convolutional neural networks, and in particular their non-local attention mechanisms, which ought to be particularly useful in ancient coin analysis by associating semantically but not visually related distal elements of a coin's design. Evaluated on a large data corpus of 14,820 images and 7605 issues, using transfer learning and only a small training set of 542 images of 24 issues, our Double Siamese ViT model is shown to surpass the state of the art by a large margin, achieving an overall accuracy of 81%. Moreover, our further investigation of the results shows that the majority of the method's errors are unrelated to the intrinsic aspects of the algorithm itself, but are rather a consequence of unclean data, which is a problem that can be easily addressed in practice by simple pre-processing and quality checking.

摘要

古代钱币学，即对古代硬币的研究，近年来已成为计算机视觉和机器学习应用的一个有吸引力的领域。尽管该领域存在诸多研究问题，但迄今为止，这一领域的主要关注点一直是根据图像对硬币进行归属判定的任务，也就是识别其发行情况。这可被视为该领域的核心问题，并且它持续对自动方法构成挑战。在本文中，我们解决了先前工作的一些局限性。首先，现有方法将该问题作为分类任务来处理。因此，它们无法处理没有或只有少量样本的类别（鉴于仅罗马帝国硬币就有超过50,000种发行情况，这种类别是大多数），并且当新类别的样本可用时需要重新训练。因此，我们并非寻求学习一种能将一个类别与所有其他类别区分开的表示，而是寻求一种最擅长区分不同类别的表示，从而放弃对类别的样本的需求。这导致我们采用按发行情况进行成对硬币匹配的范式，而不是通常的分类范式，以及我们以连体神经网络形式提出的具体解决方案。此外，在采用深度学习时，鉴于其在该领域的成功以及相对于经典计算机视觉方法无可争议的优势，我们还试图利用Transformer相对于先前使用的卷积神经网络的优势，特别是其非局部注意力机制，通过关联硬币设计中语义上而非视觉上相关的远端元素，这在古代硬币分析中应该特别有用。在一个由14,820张图像和7605种发行情况组成的大数据语料库上进行评估，使用迁移学习且仅使用由24种发行情况的542张图像组成的小训练集，我们的双连体视觉Transformer（Double Siamese ViT）模型被证明大幅超越了当前的技术水平，实现了81%的总体准确率。此外，我们对结果的进一步研究表明，该方法的大多数错误与算法本身的内在方面无关，而是数据不纯净的结果，这是一个在实践中通过简单的预处理和质量检查就能轻松解决的问题。