Suppr超能文献

寻找科学主题。

Finding scientific topics.

作者信息

Griffiths Thomas L, Steyvers Mark

机构信息

Department of Psychology, Stanford University, Stanford, CA 94305, USA.

出版信息

Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.

Abstract

A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.

摘要

识别文档内容的第一步是确定该文档涉及哪些主题。我们描述了一种由Blei、Ng和Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993 - 1022] 提出的文档生成模型,其中每个文档通过选择主题上的分布,然后根据此分布从所选主题中选择文档中的每个单词来生成。然后,我们提出一种马尔可夫链蒙特卡罗算法用于此模型的推理。我们使用该算法通过贝叶斯模型选择来确定主题数量,从而分析美国国家科学院院刊(PNAS)的摘要。我们表明,提取的主题捕捉到了数据中有意义的结构,与文章作者提供的类别指定一致,并概述了此分析的进一步应用,包括通过检查时间动态来识别“热门话题”以及为摘要添加标签以说明语义内容。

相似文献

1
Finding scientific topics.寻找科学主题。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.
2
Mapping topics and topic bursts in PNAS.绘制《美国国家科学院院刊》中的主题及主题爆发情况。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5287-90. doi: 10.1073/pnas.0307626100. Epub 2004 Feb 20.
3
The simultaneous evolution of author and paper networks.作者网络与论文网络的同步演化。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5266-73. doi: 10.1073/pnas.0307625100. Epub 2004 Feb 19.
4
Mixed-membership models of scientific publications.科学出版物的混合成员模型。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5220-7. doi: 10.1073/pnas.0307760101. Epub 2004 Mar 12.
5
Mapping knowledge domains: characterizing PNAS.绘制知识领域:描绘《美国国家科学院院刊》
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5192-9. doi: 10.1073/pnas.0307509100. Epub 2004 Feb 12.
7
Link-topic model for biomedical abbreviation disambiguation.用于生物医学缩写词消歧的链接主题模型
J Biomed Inform. 2015 Feb;53:367-80. doi: 10.1016/j.jbi.2014.12.013. Epub 2014 Dec 30.
10
Connecting the latent multinomial.连接潜在多项式。
Biometrics. 2015 Dec;71(4):1070-80. doi: 10.1111/biom.12333. Epub 2015 Jun 1.

引用本文的文献

9
30 years of climate related phenological research: themes and trends.30年的气候相关物候学研究:主题与趋势
Int J Biometeorol. 2025 Jun;69(6):1459-1473. doi: 10.1007/s00484-025-02903-w. Epub 2025 May 12.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验