• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

甲骨文识别与破译开放数据集。

An open dataset for oracle bone character recognition and decipherment.

机构信息

Huazhong University of Science and Technology, Wuhan, 430074, China.

The University of Adelaide, SA, Adelaide, 5005, Australia.

出版信息

Sci Data. 2024 Sep 6;11(1):976. doi: 10.1038/s41597-024-03807-x.

DOI:10.1038/s41597-024-03807-x
PMID:39242622
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11379903/
Abstract

Oracle bone script, one of the earliest known forms of ancient Chinese writing, presents invaluable research materials for scholars studying the humanities and geography of the Shang Dynasty, dating back 3,000 years. The immense historical and cultural significance of these writings cannot be overstated. However, the passage of time has obscured much of their meaning, presenting a significant challenge in deciphering these ancient texts. With the advent of Artificial Intelligence (AI), employing AI to assist in deciphering Oracle Bone Characters (OBCs) has become a feasible option. Yet, progress in this area has been hindered by a lack of high-quality datasets. To address this issue, this paper details the creation of the HUST-OBC dataset. This dataset encompasses 77,064 images of 1,588 individual deciphered characters and 62,989 images of 9,411 undeciphered characters, with a total of 140,053 images, compiled from diverse sources. The hope is that this dataset could inspire and assist future research in deciphering those unknown OBCs. All the codes and datasets are available at https://github.com/Pengjie-W/HUST-OBC .

摘要

甲骨文是中国最早的古代文字形式之一,为研究三千年前的商代人文地理提供了宝贵的研究材料。这些文字具有巨大的历史和文化意义,其重要性怎么强调都不为过。然而,随着时间的推移,它们的许多含义已经模糊不清,这给解读这些古代文本带来了重大挑战。随着人工智能 (AI) 的出现,利用 AI 来辅助甲骨文字符 (OBC) 的破译已成为一种可行的选择。然而,该领域的进展受到高质量数据集缺乏的阻碍。为了解决这个问题,本文详细介绍了 HUST-OBC 数据集的创建。该数据集包含 1,588 个已破译字符的 77,064 个图像和 9,411 个未破译字符的 62,989 个图像,共有 140,053 个图像,这些图像来自不同的来源。希望这个数据集能够为未来的破译未知甲骨文字符的研究提供灵感和帮助。所有代码和数据集都可以在 https://github.com/Pengjie-W/HUST-OBC 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/4c44fc176303/41597_2024_3807_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/c6d040fa8002/41597_2024_3807_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/658a94682042/41597_2024_3807_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/539112ed19ad/41597_2024_3807_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/cbb11f75c435/41597_2024_3807_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/d4773136785a/41597_2024_3807_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/06955e1edb62/41597_2024_3807_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/f19b94ba72cc/41597_2024_3807_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/4c44fc176303/41597_2024_3807_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/c6d040fa8002/41597_2024_3807_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/658a94682042/41597_2024_3807_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/539112ed19ad/41597_2024_3807_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/cbb11f75c435/41597_2024_3807_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/d4773136785a/41597_2024_3807_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/06955e1edb62/41597_2024_3807_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/f19b94ba72cc/41597_2024_3807_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/11379903/4c44fc176303/41597_2024_3807_Fig8_HTML.jpg

相似文献

1
An open dataset for oracle bone character recognition and decipherment.甲骨文识别与破译开放数据集。
Sci Data. 2024 Sep 6;11(1):976. doi: 10.1038/s41597-024-03807-x.
2
Study on the evolution of Chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.基于少样本学习的汉字演变研究:从甲骨文到楷书。
PLoS One. 2022 Aug 19;17(8):e0272974. doi: 10.1371/journal.pone.0272974. eCollection 2022.
3
Ancient Chinese Character Recognition with Improved Swin-Transformer and Flexible Data Enhancement Strategies.基于改进的Swin Transformer和灵活数据增强策略的古汉字识别
Sensors (Basel). 2024 Mar 28;24(7):2182. doi: 10.3390/s24072182.
4
Investigating the tool marks on oracle bones inscriptions from the Yinxu site (ca., 1319-1046 BC), Henan province, China.对中国河南省殷墟遗址(约公元前1319年至1046年)甲骨文中的工具痕迹进行调查。
Microsc Res Tech. 2016 Sep;79(9):827-32. doi: 10.1002/jemt.22705. Epub 2016 Jun 22.
5
A dataset of oracle characters for benchmarking machine learning algorithms.甲骨文数据集,用于机器学习算法基准测试。
Sci Data. 2024 Jan 18;11(1):87. doi: 10.1038/s41597-024-02933-w.
6
Unsupervised Structure-Texture Separation Network for Oracle Character Recognition.无监督结构-纹理分离网络用于 Oracle 字符识别。
IEEE Trans Image Process. 2022;31:3137-3150. doi: 10.1109/TIP.2022.3165989. Epub 2022 Apr 20.
7
Research on denoising method of chinese ancient character image based on chinese character writing standard model.基于汉字书写规范模型的汉字图像去噪方法研究。
Sci Rep. 2022 Nov 17;12(1):19795. doi: 10.1038/s41598-022-24388-y.
8
A Classification Method of Oracle Materials Based on Local Convolutional Neural Network Framework.基于局部卷积神经网络框架的甲骨文材料分类方法。
IEEE Comput Graph Appl. 2020 May-Jun;40(3):32-44. doi: 10.1109/MCG.2020.2973109. Epub 2020 Feb 20.
9
Dataset for studying gender disparity in English literary texts.用于研究英语文学文本中性别差异的数据集。
Data Brief. 2022 Feb 2;41:107905. doi: 10.1016/j.dib.2022.107905. eCollection 2022 Apr.
10
GHCR-A dataset for Grantha handwritten character recognition.用于格兰塔手写字符识别的GHCR-A数据集。
Data Brief. 2024 Aug 6;56:110783. doi: 10.1016/j.dib.2024.110783. eCollection 2024 Oct.

引用本文的文献

1
A large-scale dataset for Chinese historical document recognition and analysis.一个用于中国历史文献识别与分析的大规模数据集。
Sci Data. 2025 Jan 29;12(1):169. doi: 10.1038/s41597-025-04495-x.

本文引用的文献

1
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
2
Building Hierarchical Representations for Oracle Character and Sketch Recognition.构建用于甲骨文字符和草图识别的分层表示。
IEEE Trans Image Process. 2016 Jan;25(1):104-18. doi: 10.1109/TIP.2015.2500019. Epub 2015 Nov 11.