• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自然语言处理阅读阿卡德语楔形文字。

Reading Akkadian cuneiform using natural language processing.

机构信息

Faculty of Social Sciences and Humanities, Digital Humanities Ariel Lab, Ariel University, Ariel, Israel.

School of Computer Sciences, Tel Aviv University, Tel Aviv, Israel.

出版信息

PLoS One. 2020 Oct 28;15(10):e0240511. doi: 10.1371/journal.pone.0240511. eCollection 2020.

DOI:10.1371/journal.pone.0240511
PMID:33112872
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7592802/
Abstract

In this paper we present a new method for automatic transliteration and segmentation of Unicode cuneiform glyphs using Natural Language Processing (NLP) techniques. Cuneiform is one of the earliest known writing system in the world, which documents millennia of human civilizations in the ancient Near East. Hundreds of thousands of cuneiform texts were found in the nineteenth and twentieth centuries CE, most of which are written in Akkadian. However, there are still tens of thousands of texts to be published. We use models based on machine learning algorithms such as recurrent neural networks (RNN) with an accuracy reaching up to 97% for automatically transliterating and segmenting standard Unicode cuneiform glyphs into words. Therefore, our method and results form a major step towards creating a human-machine interface for creating digitized editions. Our code, Akkademia, is made publicly available for use via a web application, a python package, and a github repository.

摘要

本文提出了一种新的方法,利用自然语言处理 (NLP) 技术自动音译和分割 Unicode 楔形文字。楔形文字是世界上已知最早的书写系统之一,记录了古代近东地区几千年的人类文明。在 19 世纪和 20 世纪,人们发现了数十万份楔形文字文本,其中大部分用阿卡德语书写。然而,仍有数万份文本有待出版。我们使用基于机器学习算法的模型,例如具有循环神经网络 (RNN) 的模型,其自动音译和分割标准 Unicode 楔形文字的准确率高达 97%,从而将标准 Unicode 楔形文字音译和分割成单词。因此,我们的方法和结果是朝着为创建数字化版本创建人机界面迈出的重要一步。我们的代码 Akkademia 通过一个网络应用程序、一个 Python 包和一个 GitHub 存储库公开提供使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/0f97d9d318e9/pone.0240511.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/7f5b0b4848c2/pone.0240511.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/631909372b53/pone.0240511.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/a8239a6df475/pone.0240511.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/0f97d9d318e9/pone.0240511.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/7f5b0b4848c2/pone.0240511.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/631909372b53/pone.0240511.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/a8239a6df475/pone.0240511.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb8b/7592802/0f97d9d318e9/pone.0240511.g004.jpg

相似文献

1
Reading Akkadian cuneiform using natural language processing.使用自然语言处理阅读阿卡德语楔形文字。
PLoS One. 2020 Oct 28;15(10):e0240511. doi: 10.1371/journal.pone.0240511. eCollection 2020.
2
Translating Akkadian to English with neural machine translation.使用神经机器翻译将阿卡德语翻译成英语。
PNAS Nexus. 2023 May 2;2(5):pgad096. doi: 10.1093/pnasnexus/pgad096. eCollection 2023 May.
3
Deep learning of cuneiform sign detection with weak supervision using transliteration alignment.基于音译对齐的弱监督楔形文字符号检测深度学习。
PLoS One. 2020 Dec 16;15(12):e0243039. doi: 10.1371/journal.pone.0243039. eCollection 2020.
4
Restoration of fragmentary Babylonian texts using recurrent neural networks.使用递归神经网络修复残缺的巴比伦文本。
Proc Natl Acad Sci U S A. 2020 Sep 15;117(37):22743-22751. doi: 10.1073/pnas.2003794117. Epub 2020 Sep 1.
5
Identification of patients with carotid stenosis using natural language processing.使用自然语言处理识别颈动脉狭窄患者。
Eur Radiol. 2020 Jul;30(7):4125-4133. doi: 10.1007/s00330-020-06721-z. Epub 2020 Feb 26.
6
Birth malformations in Babylon and Assyria.巴比伦和亚述的出生缺陷。
Am J Med Genet. 2000 Apr 10;91(4):318-21. doi: 10.1002/(sici)1096-8628(20000410)91:4<318::aid-ajmg14>3.0.co;2-c.
7
Create distinctive databases of ancient languages and using a computer vision model to accurately recognize and classify them.创建独特的古代语言数据库,并使用计算机视觉模型对其进行准确识别和分类。
Data Brief. 2024 Aug 10;56:110809. doi: 10.1016/j.dib.2024.110809. eCollection 2024 Oct.
8
A scale space approach for automatically segmenting words from historical handwritten documents.一种用于从历史手写文档中自动分割单词的尺度空间方法。
IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1212-25. doi: 10.1109/TPAMI.2005.150.
9
Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data.为多站点意大利语医学自然语言数据项目定义预处理管道。
Stud Health Technol Inform. 2023 Oct 20;309:48-52. doi: 10.3233/SHTI230737.
10
Recurrent Deep Network Models for Clinical NLP Tasks: Use Case with Sentence Boundary Disambiguation.用于临床自然语言处理任务的循环深度网络模型:句子边界消歧用例
Stud Health Technol Inform. 2019 Aug 21;264:198-202. doi: 10.3233/SHTI190211.

引用本文的文献

1
Translating Akkadian to English with neural machine translation.使用神经机器翻译将阿卡德语翻译成英语。
PNAS Nexus. 2023 May 2;2(5):pgad096. doi: 10.1093/pnasnexus/pgad096. eCollection 2023 May.

本文引用的文献

1
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.