Ma Haibo, Wang Chaobo, Li Ang, Xu Aide, Han Dong
Library, Panjin Campus of Dalian University of Technology, Panjin 124000, China.
School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China.
Sensors (Basel). 2024 Dec 14;24(24):7996. doi: 10.3390/s24247996.
Book localization is crucial for the development of intelligent book inventory systems, where the high-precision detection of book spines is a critical requirement. However, the varying tilt angles and diverse aspect ratios of books on library shelves often reduce the effectiveness of conventional object detection algorithms. To address these challenges, this study proposes an enhanced oriented R-CNN algorithm for book spine detection. First, we replace the standard 3 × 3 convolutions in ResNet50's residual blocks with deformable convolutions to enhance the network's capacity for modeling the geometric deformations of book spines. Additionally, the PAFPN (Path Aggregation Feature Pyramid Network) was integrated into the neck structure to enhance multi-scale feature fusion. To further optimize the anchor box design, we introduce an adaptive initial cluster center selection method for K-median clustering. This allows for a more accurate computation of anchor box aspect ratios that are better aligned with the book spine dataset, enhancing the model's training performance. We conducted comparison experiments between the proposed model and other state-of-the-art models on the book spine dataset, and the results demonstrate that the proposed approach reaches an mAP of 90.22%, which outperforms the baseline algorithm by 4.47 percentage points. Our method significantly improves detection accuracy, making it highly effective for identifying book spines in real-world library environments.
图书定位对于智能图书库存系统的发展至关重要,其中高精度检测书脊是一项关键要求。然而,图书馆书架上书籍的倾斜角度各异且宽高比多样,这常常会降低传统目标检测算法的有效性。为应对这些挑战,本研究提出一种用于书脊检测的增强型定向区域卷积神经网络(R-CNN)算法。首先,我们用可变形卷积替换ResNet50残差块中的标准3×3卷积,以增强网络对书脊几何变形的建模能力。此外,将路径聚合特征金字塔网络(PAFPN)集成到颈部结构中,以增强多尺度特征融合。为进一步优化锚框设计,我们引入一种用于K均值聚类的自适应初始聚类中心选择方法。这使得能够更准确地计算与书脊数据集更匹配的锚框宽高比,提高模型的训练性能。我们在书脊数据集上对所提出的模型与其他现有先进模型进行了对比实验,结果表明所提出的方法平均精度均值(mAP)达到90.22%,比基线算法高出4.47个百分点。我们的方法显著提高了检测精度,使其在现实世界的图书馆环境中识别书脊非常有效。