Liu Wenfu, Pang Jianmin, Du Qiming, Li Nan, Yang Shudan
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China.
State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, Luoyang 471003, China.
Sensors (Basel). 2022 Jan 29;22(3):1066. doi: 10.3390/s22031066.
Short text representation is one of the basic and key tasks of NLP. The traditional method is to simply merge the bag-of-words model and the topic model, which may lead to the problem of ambiguity in semantic information, and leave topic information sparse. We propose an unsupervised text representation method that involves fusing word embeddings and extended topic information. Following this, two fusion strategies of weighted word embeddings and extended topic information are designed: static linear fusion and dynamic fusion. This method can highlight important semantic information, flexibly fuse topic information, and improve the capabilities of short text representation. We use classification and prediction tasks to verify the effectiveness of the method. The testing results show that the method is valid.
短文本表示是自然语言处理的基本和关键任务之一。传统方法是简单地将词袋模型和主题模型合并,这可能导致语义信息模糊的问题,并且使主题信息稀疏。我们提出了一种无监督的文本表示方法,该方法涉及融合词嵌入和扩展主题信息。在此基础上,设计了加权词嵌入和扩展主题信息的两种融合策略:静态线性融合和动态融合。该方法可以突出重要的语义信息,灵活地融合主题信息,并提高短文本表示的能力。我们使用分类和预测任务来验证该方法的有效性。测试结果表明该方法是有效的。