文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Kastrati Zenun, Kurti Arianit, Imran Ali Shariq

Dept. of Computer Science and Media Technology, Linnaeus University, Växjö, Sweden.

Dept. of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.

Data Brief. 2020 Jan 3;28:105090. doi: 10.1016/j.dib.2019.105090. eCollection 2020 Feb.

In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as input to two well-known NLP techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) to generate word embeddings and topic vectors, respectively. We used Word2Vec and LDA implementation in the Gensim package in Python. The data presented in this article are related to the research article entitled "Integrating word embeddings and document topics with deep learning in a video classification framework" [1]. The dataset is hosted in the Mendeley Data repository [2].

在本文中，我们展示了一个数据集，该数据集包含从大规模开放在线课程（MOOC）视频讲座转录本生成的词嵌入和文档主题分布向量。从Coursera学习平台收集了来自200门课程的12,032个视频讲座的转录本。这个庞大的转录本语料库被用作两种著名的自然语言处理（NLP）技术的输入，即Word2Vec和潜在狄利克雷分配（LDA），分别用于生成词嵌入和主题向量。我们使用了Python中Gensim包的Word2Vec和LDA实现。本文呈现的数据与题为“在视频分类框架中通过深度学习整合词嵌入和文档主题”的研究文章[1]相关。该数据集托管在Mendeley数据存储库[2]中。

Kastrati Zenun, Kurti Arianit, Imran Ali Shariq

Dept. of Computer Science and Media Technology, Linnaeus University, Växjö, Sweden.

Dept. of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.

Data Brief. 2020 Jan 3;28:105090. doi: 10.1016/j.dib.2019.105090. eCollection 2020 Feb.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

WET：大规模在线开放课程（MOOC）视频讲座数据集的词嵌入-主题分布向量

WET: Word embedding-topic distribution vectors for MOOC video lectures dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

WET：大规模在线开放课程（MOOC）视频讲座数据集的词嵌入-主题分布向量

WET: Word embedding-topic distribution vectors for MOOC video lectures dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献