• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于迁移学习的质量管理相关文本混合分词模型。

A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning.

机构信息

School of Management Science and Real Estate, Chongqing University, Chongqing, P. R. China.

College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, P. R. China.

出版信息

PLoS One. 2022 Oct 7;17(10):e0270154. doi: 10.1371/journal.pone.0270154. eCollection 2022.

DOI:10.1371/journal.pone.0270154
PMID:36206249
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9543942/
Abstract

Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related texts, word segmentation algorithms for normal Chinese texts cannot be directly applied. Hence, based on the analysis of QM-related texts, we summarized six features, and proposed a hybrid Chinese word segmentation model by means of integrating transfer learning (TL), bidirectional long-short term memory (Bi-LSTM), multi-head attention (MA), and conditional random field (CRF) to construct the mTL-Bi-LSTM-MA-CRF model, considering insufficient samples of QM-related texts and excessive cutting of idioms. The mTL-Bi-LSTM-MA-CRF model is composed of two steps. Firstly, based on a word embedding space, the Bi-LSTM is introduced for context information learning, and the MA mechanism is selected to allocate attention among subspaces, and then the CRF is used to learn label sequence constraints. Secondly, a modified TL method is put forward for text feature extraction, adaptive layer weights learning, and loss function correction for selective learning. Experimental results show that the proposed model can achieve good word segmentation results with only a relatively small set of samples.

摘要

文本信息挖掘是数据驱动的自动/半自动质量管理 (QM) 的关键步骤。对于中文文本,由于没有明确的标记来定义词边界,因此在预处理时需要使用分词算法。由于与 QM 相关的文本具有内在的特点,因此不能直接将用于普通中文文本的分词算法应用于 QM 相关文本。因此,基于对与 QM 相关的文本的分析,我们总结了六个特征,并提出了一种混合中文分词模型,通过集成迁移学习 (TL)、双向长短期记忆 (Bi-LSTM)、多头注意力 (MA) 和条件随机场 (CRF),构建 mTL-Bi-LSTM-MA-CRF 模型,考虑到与 QM 相关的文本样本不足和成语过度分割的问题。mTL-Bi-LSTM-MA-CRF 模型由两个步骤组成。首先,基于词嵌入空间,引入 Bi-LSTM 进行上下文信息学习,并选择 MA 机制在子空间之间分配注意力,然后使用 CRF 学习标签序列约束。其次,提出了一种改进的 TL 方法,用于文本特征提取、自适应层权重学习和选择性学习的损失函数修正。实验结果表明,该模型仅使用相对较小的样本集即可实现良好的分词效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/b9636000942f/pone.0270154.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/7a5399728bfd/pone.0270154.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/0db74e9d99b8/pone.0270154.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/2f305d552d74/pone.0270154.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/6db522bd00d6/pone.0270154.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/86d2b5d6a7f2/pone.0270154.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/b9636000942f/pone.0270154.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/7a5399728bfd/pone.0270154.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/0db74e9d99b8/pone.0270154.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/2f305d552d74/pone.0270154.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/6db522bd00d6/pone.0270154.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/86d2b5d6a7f2/pone.0270154.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95a8/9543942/b9636000942f/pone.0270154.g006.jpg

相似文献

1
A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning.基于迁移学习的质量管理相关文本混合分词模型。
PLoS One. 2022 Oct 7;17(10):e0270154. doi: 10.1371/journal.pone.0270154. eCollection 2022.
2
Legal Text Recognition Using LSTM-CRF Deep Learning Model.基于 LSTM-CRF 深度学习模型的法律文本识别
Comput Intell Neurosci. 2022 Mar 17;2022:9933929. doi: 10.1155/2022/9933929. eCollection 2022.
3
Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.通过结合领域字典和规则来提高中文电子病历的命名实体识别。
Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.
4
Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取
Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.
5
A two-stage deep learning approach for extracting entities and relationships from medical texts.一种从医学文本中提取实体和关系的两阶段深度学习方法。
J Biomed Inform. 2019 Nov;99:103285. doi: 10.1016/j.jbi.2019.103285. Epub 2019 Sep 20.
6
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
7
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
8
Long short-term memory (LSTM)-based news classification model.基于长短时记忆网络(LSTM)的新闻分类模型。
PLoS One. 2024 May 30;19(5):e0301835. doi: 10.1371/journal.pone.0301835. eCollection 2024.
9
Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding.基于具有双层嵌入的层次递归神经网络从电子健康记录中检测药物不良反应。
Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.
10
Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别
JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

引用本文的文献

1
Chinese Named Entity Recognition for Automobile Fault Texts Based on External Context Retrieving and Adversarial Training.基于外部上下文检索和对抗训练的汽车故障文本中文命名实体识别
Entropy (Basel). 2025 Jan 27;27(2):133. doi: 10.3390/e27020133.

本文引用的文献

1
A Knowledge-Based Deep Learning Architecture for Aspect-Based Sentiment Analysis.基于知识的深度学习架构在基于方面的情感分析中的应用。
Int J Neural Syst. 2021 Oct;31(10):2150046. doi: 10.1142/S0129065721500465. Epub 2021 Aug 25.
2
Cross-Machine Fault Diagnosis with Semi-Supervised Discriminative Adversarial Domain Adaptation.跨机器故障诊断的半监督判别式对抗域自适应方法。
Sensors (Basel). 2020 Jul 4;20(13):3753. doi: 10.3390/s20133753.
3
Degradation prognosis for proton exchange membrane fuel cell based on hybrid transfer learning and intercell differences.
基于混合迁移学习和电池间差异的质子交换膜燃料电池降解预测
ISA Trans. 2021 Jul;113:149-165. doi: 10.1016/j.isatra.2020.06.005. Epub 2020 Jun 11.
4
Geometric Deep Lean Learning: Deep Learning in Industry 4.0 Cyber-Physical Complex Networks.几何深度学习:工业 4.0 信息物理复杂网络中的深度学习。
Sensors (Basel). 2020 Jan 30;20(3):763. doi: 10.3390/s20030763.
5
Repetition causes confusion: Insights to word segmentation during Chinese reading.重复导致混淆:中文阅读中的分词洞察力。
J Exp Psychol Learn Mem Cogn. 2021 Jan;47(1):147-156. doi: 10.1037/xlm0000817. Epub 2020 Jan 16.
6
Word segmentation of overlapping ambiguous strings during Chinese reading.中文阅读过程中重叠歧义字符串的分词
J Exp Psychol Hum Percept Perform. 2014 Jun;40(3):1046-59. doi: 10.1037/a0035389. Epub 2014 Jan 13.