• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于加权词嵌入和扩展主题信息的短文本表示融合方法。

A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information.

作者信息

Liu Wenfu, Pang Jianmin, Du Qiming, Li Nan, Yang Shudan

机构信息

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China.

State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, Luoyang 471003, China.

出版信息

Sensors (Basel). 2022 Jan 29;22(3):1066. doi: 10.3390/s22031066.

DOI:10.3390/s22031066
PMID:35161808
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8839561/
Abstract

Short text representation is one of the basic and key tasks of NLP. The traditional method is to simply merge the bag-of-words model and the topic model, which may lead to the problem of ambiguity in semantic information, and leave topic information sparse. We propose an unsupervised text representation method that involves fusing word embeddings and extended topic information. Following this, two fusion strategies of weighted word embeddings and extended topic information are designed: static linear fusion and dynamic fusion. This method can highlight important semantic information, flexibly fuse topic information, and improve the capabilities of short text representation. We use classification and prediction tasks to verify the effectiveness of the method. The testing results show that the method is valid.

摘要

短文本表示是自然语言处理的基本和关键任务之一。传统方法是简单地将词袋模型和主题模型合并,这可能导致语义信息模糊的问题,并且使主题信息稀疏。我们提出了一种无监督的文本表示方法,该方法涉及融合词嵌入和扩展主题信息。在此基础上,设计了加权词嵌入和扩展主题信息的两种融合策略:静态线性融合和动态融合。该方法可以突出重要的语义信息,灵活地融合主题信息,并提高短文本表示的能力。我们使用分类和预测任务来验证该方法的有效性。测试结果表明该方法是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/7b2e05d9251b/sensors-22-01066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/293d7b465b9c/sensors-22-01066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/1da863d6a95d/sensors-22-01066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/7b2e05d9251b/sensors-22-01066-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/293d7b465b9c/sensors-22-01066-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/1da863d6a95d/sensors-22-01066-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fcf/8839561/7b2e05d9251b/sensors-22-01066-g003.jpg

相似文献

1
A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information.一种基于加权词嵌入和扩展主题信息的短文本表示融合方法。
Sensors (Basel). 2022 Jan 29;22(3):1066. doi: 10.3390/s22031066.
2
A Method of Short Text Representation Based on the Feature Probability Embedded Vector.一种基于特征概率嵌入向量的短文本表示方法。
Sensors (Basel). 2019 Aug 28;19(17):3728. doi: 10.3390/s19173728.
3
Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in Spanish.西班牙语临床领域轻量级词嵌入的训练与内在评估
Front Artif Intell. 2022 Sep 21;5:970517. doi: 10.3389/frai.2022.970517. eCollection 2022.
4
Nonparametric Spherical Topic Modeling with Word Embeddings.基于词嵌入的非参数球面主题模型
Proc Conf Assoc Comput Linguist Meet. 2016 Aug;2016:537-542. doi: 10.18653/v1/P16-2087.
5
A Topic Recognition Method of News Text Based on Word Embedding Enhancement.基于词向量增强的新闻文本主题识别方法。
Comput Intell Neurosci. 2022 Feb 16;2022:4582480. doi: 10.1155/2022/4582480. eCollection 2022.
6
Gender-sensitive word embeddings for healthcare.面向医疗保健的性别敏感词嵌入
J Am Med Inform Assoc. 2022 Jan 29;29(3):415-423. doi: 10.1093/jamia/ocab279.
7
Short text topic modelling using local and global word-context semantic correlation.使用局部和全局词上下文语义相关性的短文本主题建模
Multimed Tools Appl. 2023 Feb 2:1-23. doi: 10.1007/s11042-023-14352-x.
8
Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.通用和特定词嵌入在研究转化阶段分类中的效用
AMIA Annu Symp Proc. 2018 Dec 5;2018:1405-1414. eCollection 2018.
9
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
10
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.

引用本文的文献

1
Attention-aware with stacked embedding for sentiment analysis of student feedback through deep learning techniques.通过深度学习技术对学生反馈进行情感分析的基于堆叠嵌入的注意力感知方法。
PeerJ Comput Sci. 2024 Sep 2;10:e2283. doi: 10.7717/peerj-cs.2283. eCollection 2024.
2
Movie Scene Event Extraction with Graph Attention Network Based on Argument Correlation Information.基于论元关联信息的图注意力网络电影场景事件抽取
Sensors (Basel). 2023 Feb 17;23(4):2285. doi: 10.3390/s23042285.
3
Few-Shot Text Classification with Global-Local Feature Information.

本文引用的文献

1
A Method of Short Text Representation Based on the Feature Probability Embedded Vector.一种基于特征概率嵌入向量的短文本表示方法。
Sensors (Basel). 2019 Aug 28;19(17):3728. doi: 10.3390/s19173728.
基于全局-局部特征信息的少样本文本分类。
Sensors (Basel). 2022 Jun 11;22(12):4420. doi: 10.3390/s22124420.