Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan 33302, Taiwan.
Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan 33302, Taiwan.
PLoS One. 2019 Nov 12;14(11):e0224452. doi: 10.1371/journal.pone.0224452. eCollection 2019.
This study presents a novel research approach to predict user interaction for social media post using machine learning algorithms. The posts are converted to vector form using word2vec and doc2vec model. These two methods are used to analyse the best approach for generating word embeddings. The generated word embeddings of post combined with other attributes like post published time, type of post and total interactions are used to train machine learning algorithms. Deep neural network (DNN), Extreme Learning Machine (ELM) and Long Short-Term Memory (LSTM) are used to compare the prediction of total interaction for a particular post. For word2vec, the word vectors are created using both continuous bag-of-words (CBOW) and skip-gram models. Also the pre-trained word vectors provided by google is used for the analysis. For doc2vec, the word embeddings are created using both the Distributed Memory model of Paragraph Vectors (PV-DM) and Distributed Bag of Words model of Paragraph Vectors (PV-DBOW). A word embedding is also created using PV-DBOW combined with skip-gram.
本研究提出了一种使用机器学习算法预测社交媒体帖子用户交互的新方法。帖子使用 word2vec 和 doc2vec 模型转换为向量形式。这两种方法用于分析生成词嵌入的最佳方法。帖子的生成词嵌入与其他属性(如帖子发布时间、帖子类型和总交互)结合使用来训练机器学习算法。深度神经网络(DNN)、极限学习机(ELM)和长短期记忆(LSTM)用于比较特定帖子总交互的预测。对于 word2vec,使用连续词袋(CBOW)和跳字模型创建单词向量。还使用谷歌提供的预训练单词向量进行分析。对于 doc2vec,使用段落向量的分布式内存模型(PV-DM)和段落向量的分布式词袋模型(PV-DBOW)创建词嵌入。还使用 PV-DBOW 结合跳字创建了一个词嵌入。