• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

面向专业领域小样本的DAT-MT加速图融合依存句法分析模型

DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields.

作者信息

Li Rui, Shu Shili, Wang Shunli, Liu Yang, Li Yanhao, Peng Mingjun

机构信息

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China.

Wuhan Geomatics Institute, Wuhan 430079, China.

出版信息

Entropy (Basel). 2023 Oct 12;25(10):1444. doi: 10.3390/e25101444.

DOI:10.3390/e25101444
PMID:37895565
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10606639/
Abstract

The rapid development of information technology has made the amount of information in massive texts far exceed human intuitive cognition, and dependency parsing can effectively deal with information overload. In the background of domain specialization, the migration and application of syntactic treebanks and the speed improvement in syntactic analysis models become the key to the efficiency of syntactic analysis. To realize domain migration of syntactic tree library and improve the speed of text parsing, this paper proposes a novel approach-the Double-Array Trie and Multi-threading (DAT-MT) accelerated graph fusion dependency parsing model. It effectively combines the specialized syntactic features from small-scale professional field corpus with the generalized syntactic features from large-scale news corpus, which improves the accuracy of syntactic relation recognition. Aiming at the problem of high space and time complexity brought by the graph fusion model, the DAT-MT method is proposed. It realizes the rapid mapping of massive Chinese character features to the model's prior parameters and the parallel processing of calculation, thereby improving the parsing speed. The experimental results show that the unlabeled attachment score (UAS) and the labeled attachment score (LAS) of the model are improved by 13.34% and 14.82% compared with the model with only the professional field corpus and improved by 3.14% and 3.40% compared with the model only with news corpus; both indicators are better than DDParser and LTP 4 methods based on deep learning. Additionally, the method in this paper achieves a speedup of about 3.7 times compared to the method with a red-black tree index and a single thread. Efficient and accurate syntactic analysis methods will benefit the real-time processing of massive texts in professional fields, such as multi-dimensional semantic correlation, professional feature extraction, and domain knowledge graph construction.

摘要

信息技术的快速发展使得海量文本中的信息量远远超过人类的直观认知,而依存句法分析能够有效应对信息过载问题。在领域专业化背景下,句法树库的迁移与应用以及句法分析模型速度的提升成为句法分析效率的关键。为实现句法树库的领域迁移并提高文本解析速度,本文提出一种新颖的方法——双数组Trie树与多线程(DAT-MT)加速的图融合依存句法分析模型。它有效地将小规模专业领域语料库中的专业句法特征与大规模新闻语料库中的通用句法特征相结合,提高了句法关系识别的准确性。针对图融合模型带来的高时空复杂度问题,提出了DAT-MT方法。它实现了海量汉字特征到模型先验参数的快速映射以及计算的并行处理,从而提高了解析速度。实验结果表明,与仅使用专业领域语料库的模型相比,该模型的无标记依存正确率(UAS)和有标记依存正确率(LAS)分别提高了13.34%和14.82%;与仅使用新闻语料库的模型相比,分别提高了3.14%和3.40%;这两个指标均优于基于深度学习的DDParser和LTP 4方法。此外,本文方法与采用红黑树索引和单线程的方法相比,实现了约3.7倍的加速。高效准确的句法分析方法将有利于专业领域海量文本的实时处理,如多维度语义关联、专业特征提取和领域知识图谱构建。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/003a095a4b37/entropy-25-01444-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/d74e3d39b607/entropy-25-01444-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/7b242471ca84/entropy-25-01444-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/6dba72f0e6a1/entropy-25-01444-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/6d27e090f898/entropy-25-01444-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/d197152fa4e9/entropy-25-01444-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/c349c6ecbc2d/entropy-25-01444-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/003a095a4b37/entropy-25-01444-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/d74e3d39b607/entropy-25-01444-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/7b242471ca84/entropy-25-01444-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/6dba72f0e6a1/entropy-25-01444-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/6d27e090f898/entropy-25-01444-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/d197152fa4e9/entropy-25-01444-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/c349c6ecbc2d/entropy-25-01444-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5552/10606639/003a095a4b37/entropy-25-01444-g007.jpg

相似文献

1
DAT-MT Accelerated Graph Fusion Dependency Parsing Model for Small Samples in Professional Fields.面向专业领域小样本的DAT-MT加速图融合依存句法分析模型
Entropy (Basel). 2023 Oct 12;25(10):1444. doi: 10.3390/e25101444.
2
ADPG: Biomedical entity recognition based on Automatic Dependency Parsing Graph.ADPG:基于自动依存句法分析图的生物医学实体识别
J Biomed Inform. 2023 Apr;140:104317. doi: 10.1016/j.jbi.2023.104317. Epub 2023 Feb 17.
3
Extracting biomedical relation from cross-sentence text using syntactic dependency graph attention network.基于句法依存图注意力网络的跨句文本生物医学关系抽取
J Biomed Inform. 2023 Aug;144:104445. doi: 10.1016/j.jbi.2023.104445. Epub 2023 Jul 17.
4
Relation Extraction in Biomedical Texts Based on Multi-Head Attention Model With Syntactic Dependency Feature: Modeling Study.基于具有句法依存特征的多头注意力模型的生物医学文本关系抽取:建模研究
JMIR Med Inform. 2022 Oct 20;10(10):e41136. doi: 10.2196/41136.
5
A Graph Convolutional Network-Based Method for Chemical-Protein Interaction Extraction: Algorithm Development.一种基于图卷积网络的化学-蛋白质相互作用提取方法:算法开发
JMIR Med Inform. 2020 May 19;8(5):e17643. doi: 10.2196/17643.
6
Multi-level semantic fusion network for Chinese medical named entity recognition.用于中文医学命名实体识别的多层次语义融合网络
J Biomed Inform. 2022 Sep;133:104144. doi: 10.1016/j.jbi.2022.104144. Epub 2022 Jul 22.
7
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
8
Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation.使用图卷积网络和多头注意力的文档级生物医学关系抽取:算法开发与验证
JMIR Med Inform. 2020 Jul 31;8(7):e17638. doi: 10.2196/17638.
9
Exploiting graph kernels for high performance biomedical relation extraction.利用图核进行高性能生物医学关系提取。
J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.
10
Dependency parsing of biomedical text with BERT.基于 BERT 的生物医学文本依存句法分析。
BMC Bioinformatics. 2020 Dec 29;21(Suppl 23):580. doi: 10.1186/s12859-020-03905-8.

本文引用的文献

1
Integrating machine learning with linguistic features: A universal method for extraction and normalization of temporal expressions in Chinese texts.将机器学习与语言特征相结合:一种用于中文文本中时间表达式提取与规范化的通用方法。
Comput Methods Programs Biomed. 2023 May;233:107474. doi: 10.1016/j.cmpb.2023.107474. Epub 2023 Mar 11.
2
A Theory-based Deep-Learning Approach to Detecting Disinformation in Financial Social Media.一种基于理论的深度学习方法用于检测金融社交媒体中的虚假信息。
Inf Syst Front. 2023;25(2):473-492. doi: 10.1007/s10796-022-10327-9. Epub 2022 Sep 12.
3
An architecture for encoding sentence meaning in left mid-superior temporal cortex.
一种用于在左颞叶中上皮质编码句子意义的架构。
Proc Natl Acad Sci U S A. 2015 Sep 15;112(37):11732-7. doi: 10.1073/pnas.1421236112. Epub 2015 Aug 24.
4
NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records.基于自然语言处理的充血性心力衰竭病例发现:对全州电子病历的前瞻性分析。
Int J Med Inform. 2015 Dec;84(12):1039-47. doi: 10.1016/j.ijmedinf.2015.06.007. Epub 2015 Jul 2.
5
Trends in syntactic parsing: anticipation, Bayesian estimation, and good-enough parsing.句法剖析的趋势:预期、贝叶斯估计与适度剖析
Trends Cogn Sci. 2014 Nov;18(11):605-11. doi: 10.1016/j.tics.2014.08.001. Epub 2014 Sep 5.