使用BERT和InceptionV3增强图书体裁分类：一种适用于图书馆的深度学习方法。

Enhancing book genre classification with BERT and InceptionV3: a deep learning approach for libraries.

作者信息

Yang Xinting, Zhang Zehua

机构信息

Library, Lanzhou University, Lanzhou, Gansu Province, China.

Department of Physics Science and Technology, Lanzhou University, Lanzhou, Gansu Province, China.

出版信息

PeerJ Comput Sci. 2025 Jun 5;11:e2934. doi: 10.7717/peerj-cs.2934. eCollection 2025.

DOI:10.7717/peerj-cs.2934

PMID:40567686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12193415/

Abstract

Accurate book genre classification is essential for library organization, information retrieval, and personalized recommendations. Traditional classification methods, often reliant on manual categorization and metadata-based approaches, struggle with the complexities of hybrid genres and evolving literary trends. To address these limitations, this study proposes a hybrid deep learning model that integrates visual and textual features for enhanced genre classification. Specifically, we employ InceptionV3, an advanced convolutional neural network architecture, to extract visual features from book cover images and bidirectional encoder representations from transformers (BERT) to analyze textual data from book titles. A scaled dot-product attention mechanism is used to effectively fuse these multimodal features, dynamically weighting their contributions based on contextual relevance. Experimental results on the BookCover30 dataset demonstrate that our proposed model outperforms baseline approaches, achieving a balanced accuracy of 0.7951 and an F1-score of 0.7920, surpassing both standalone image- and text-based classifiers. This study highlights the potential of deep learning in improving automated genre classification, offering a scalable and adaptable solution for libraries and digital platforms. Future research may focus on expanding dataset diversity, optimizing computational efficiency, and addressing biases in classification models.

摘要

准确的书籍体裁分类对于图书馆组织、信息检索和个性化推荐至关重要。传统的分类方法通常依赖于人工分类和基于元数据的方法，难以应对混合体裁和不断演变的文学趋势的复杂性。为了解决这些局限性，本研究提出了一种混合深度学习模型，该模型整合视觉和文本特征以增强体裁分类。具体而言，我们使用先进的卷积神经网络架构InceptionV3从书籍封面图像中提取视觉特征，并使用来自Transformer的双向编码器表示（BERT）来分析书籍标题中的文本数据。使用缩放点积注意力机制有效地融合这些多模态特征，根据上下文相关性动态加权它们的贡献。在BookCover30数据集上的实验结果表明，我们提出的模型优于基线方法，实现了0.7951的平衡准确率和0.7920的F1分数，超过了基于图像和文本的独立分类器。本研究突出了深度学习在改进自动体裁分类方面的潜力，为图书馆和数字平台提供了一种可扩展且适应性强的解决方案。未来的研究可能集中在扩大数据集的多样性、优化计算效率以及解决分类模型中的偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ba6/12193415/9ac381be78fb/peerj-cs-11-2934-g001.jpg

相似文献

Enhancing book genre classification with BERT and InceptionV3: a deep learning approach for libraries.使用BERT和InceptionV3增强图书体裁分类：一种适用于图书馆的深度学习方法。

PeerJ Comput Sci. 2025 Jun 5;11:e2934. doi: 10.7717/peerj-cs.2934. eCollection 2025.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测：基于放射学报告的多中心方法学研究

J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

CBAM VGG16: An efficient driver distraction classification using CBAM embedded VGG16 architecture.CBAM-VGG16：一种使用嵌入 CBAM 的 VGG16 架构的高效驾驶员分心分类方法。

Comput Biol Med. 2024 Sep;180:108945. doi: 10.1016/j.compbiomed.2024.108945. Epub 2024 Aug 1.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.利用扩散模型探索基于脑电图信号的图像生成潜力：结合混合方法和多模态分析的综合框架

JMIR Med Inform. 2025 Jun 25;13:e72027. doi: 10.2196/72027.

Class-weighted Dempster-Shafer in dual-level fusion for multimodal fake real estate listings detection.用于多模态虚假房地产列表检测的双层融合中的类加权邓普斯特-谢弗方法

PeerJ Comput Sci. 2025 May 27;11:e2797. doi: 10.7717/peerj-cs.2797. eCollection 2025.

Text intelligent correction in English translation: A study on integrating models with dependency attention mechanism.英文翻译中的文本智能校正：一项关于集成具有依存注意力机制模型的研究。

PLoS One. 2025 Jun 24;20(6):e0319690. doi: 10.1371/journal.pone.0319690. eCollection 2025.

Detecting Redundant Health Survey Questions by Using Language-Agnostic Bidirectional Encoder Representations From Transformers Sentence Embedding: Algorithm Development Study.使用来自Transformer句子嵌入的语言无关双向编码器表示法检测冗余健康调查问题：算法开发研究

JMIR Med Inform. 2025 Jun 10;13:e71687. doi: 10.2196/71687.

Integrating multi-source data for skin burn classification using deep learning.利用深度学习整合多源数据进行皮肤烧伤分类

Comput Biol Med. 2025 Sep;195:110556. doi: 10.1016/j.compbiomed.2025.110556. Epub 2025 Jun 24.

本文引用的文献

IBPL: Information Bottleneck-based Prompt Learning for graph out-of-distribution detection.IBPL：用于图分布外检测的基于信息瓶颈的提示学习

Neural Netw. 2025 Aug;188:107381. doi: 10.1016/j.neunet.2025.107381. Epub 2025 Mar 25.

From SMILES to Enhanced Molecular Property Prediction: A Unified Multimodal Framework with Predicted 3D Conformers and Contrastive Learning Techniques.从SMILES到增强分子性质预测：一个包含预测3D构象和对比学习技术的统一多模态框架。

J Chem Inf Model. 2024 Dec 23;64(24):9173-9195. doi: 10.1021/acs.jcim.4c01240. Epub 2024 Dec 6.

Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers.使用来自Transformer的预训练双向编码器表示预测天然产物中的抗疟活性。

J Chem Inf Model. 2022 Nov 14;62(21):5050-5058. doi: 10.1021/acs.jcim.1c00584. Epub 2021 Aug 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用BERT和InceptionV3增强图书体裁分类：一种适用于图书馆的深度学习方法。

Enhancing book genre classification with BERT and InceptionV3: a deep learning approach for libraries.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献