• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于泰语特殊疑问句分类的深度学习自然语言处理词性标注增强

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning.

作者信息

Chotirat Saranlita, Meesad Phayung

机构信息

Department of Information Technology, Faculty of Information Technology and Digital Innovation, King Mongkut's University of Technology North Bangkok, Thailand.

Department of Information Technology Management, Faculty of Information Technology and Digital Innovation, King Mongkut's University of Technology North Bangkok, Thailand.

出版信息

Heliyon. 2021 Oct 19;7(10):e08216. doi: 10.1016/j.heliyon.2021.e08216. eCollection 2021 Oct.

DOI:10.1016/j.heliyon.2021.e08216
PMID:34746470
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8554172/
Abstract

Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro -score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro -score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset.

摘要

问题分类是答案选择中的一项关键任务。问题分类有助于通过从句子中提取特征来定义生成的问题句子的结构,例如谁、何时、何地以及如何。在本文中,我们提出了一种通过使用特征选择和词嵌入技术来改进文本问题分类的方法。我们进行了多项实验,使用两个不同的数据集(TREC - 6数据集和泰语句子数据集),以词频以及包括一元词、一元词+二元词和一元词+三元词的组合词频 - 逆文档频率作为特征,来评估所提出方法的性能。使用了基于传统和深度学习分类器的机器学习模型。传统分类模型有多项式朴素贝叶斯、逻辑回归和支持向量机。深度学习技术包括双向长短期记忆(BiLSTM)、卷积神经网络(CNN)以及结合了CNN和BiLSTM模型的混合模型。实验结果表明,我们基于词性(POS)标注的方法在提高问题分类准确率方面效果最佳。在TREC - 6数据集中添加所有词性标签并应用支持向量机模型时,分类问题类别的平均微分值达到0.98。在泰语句子数据集中添加重点标签并使用CNN模型应用GloVe时,最高平均微分值达到0.8。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/ade703984682/gr011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/7c47a5c8cf9e/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/626b7716e6b6/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/480d7adc137a/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/59358eef784c/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/21bed615415e/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/8d570f1494fc/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/b7152ae25c38/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/a391404afe24/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/9298d3b38646/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/9b2786aadda8/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/ade703984682/gr011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/7c47a5c8cf9e/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/626b7716e6b6/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/480d7adc137a/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/59358eef784c/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/21bed615415e/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/8d570f1494fc/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/b7152ae25c38/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/a391404afe24/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/9298d3b38646/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/9b2786aadda8/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/867d/8554172/ade703984682/gr011.jpg

相似文献

1
Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning.用于泰语特殊疑问句分类的深度学习自然语言处理词性标注增强
Heliyon. 2021 Oct 19;7(10):e08216. doi: 10.1016/j.heliyon.2021.e08216. eCollection 2021 Oct.
2
A comparative analysis on question classification task based on deep learning approaches.基于深度学习方法的问题分类任务比较分析
PeerJ Comput Sci. 2021 Aug 3;7:e570. doi: 10.7717/peerj-cs.570. eCollection 2021.
3
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
4
Improving part-of-speech tagging in Amharic language using deep neural network.使用深度神经网络改进阿姆哈拉语的词性标注
Heliyon. 2023 Jun 21;9(7):e17175. doi: 10.1016/j.heliyon.2023.e17175. eCollection 2023 Jul.
5
DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning.基于预训练模型和深度学习的阿拉伯文医学问题多标签分类模型:DeBERTa-BiLSTM
Comput Biol Med. 2024 Mar;170:107921. doi: 10.1016/j.compbiomed.2024.107921. Epub 2024 Jan 4.
6
Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.使用文本分类技术从法医尸检报告预测死亡原因:一项比较研究。
J Forensic Leg Med. 2018 Jul;57:41-50. doi: 10.1016/j.jflm.2017.07.001. Epub 2017 Jul 4.
7
CapsTM: capsule network for Chinese medical text matching.CapsTM:用于中文医疗文本匹配的胶囊网络。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):94. doi: 10.1186/s12911-021-01442-9.
8
Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network.基于ReliefF和卷积神经网络的混合模型用于癌症的诊断与分类
Med Hypotheses. 2020 Apr;137:109577. doi: 10.1016/j.mehy.2020.109577. Epub 2020 Jan 20.
9
One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech.基于一维卷积神经网络和混合深度学习范式的特定语言损伤儿童语音分类方法
Comput Methods Programs Biomed. 2022 Jan;213:106487. doi: 10.1016/j.cmpb.2021.106487. Epub 2021 Oct 22.
10
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

引用本文的文献

1
A scalable thin-film defect quantify model under imbalanced regression and classification task based on computer vision.一种基于计算机视觉的不平衡回归和分类任务下的可扩展薄膜缺陷量化模型。
Heliyon. 2023 Feb 11;9(2):e13701. doi: 10.1016/j.heliyon.2023.e13701. eCollection 2023 Feb.
2
Multi-Task Learning Model for Kazakh Query Understanding.多任务学习模型在哈萨克语查询理解中的应用。
Sensors (Basel). 2022 Dec 14;22(24):9810. doi: 10.3390/s22249810.