Suppr超能文献

具有链式维度的领域-主题模型:描绘一场主要肿瘤学会议的新兴领域

Domain-topic models with chained dimensions: Charting an emergent domain of a major oncology conference.

作者信息

Hannud Abdo Alexandre, Cointet Jean-Philippe, Bourret Pascale, Cambrosio Alberto

机构信息

LISIS, Université Gustave Eiffel, INRAE Marne-la-Vallée France.

Garoa Hacker Clube São Paulo Brazil.

出版信息

J Assoc Inf Sci Technol. 2022 Jul;73(7):992-1011. doi: 10.1002/asi.24606. Epub 2021 Nov 24.

Abstract

This paper presents a contribution to the study of bibliographic corpora through science mapping. From a graph representation of documents and their textual dimension, stochastic block models can provide a simultaneous clustering of documents and words that we call a domain-topic model. Previous work investigated the resulting topics, or word clusters, while ours focuses on the study of the document clusters we call domains. To enable the description and interactive navigation of domains, we introduce measures and interfaces that consider the structure of the model to relate both types of clusters. We then present a procedure that extends the block model to cluster metadata attributes of documents, which we call a domain-chained model, noting that our measures and interfaces transpose to metadata clusters. We provide an example application to a corpus relevant to current science, technology and society (STS) research and an interesting case for our approach: the abstracts presented between 1995 and 2017 at the American Society of Clinical Oncology Annual Meeting, the major oncology research conference. Through a sequence of domain-topic and domain-chained models, we identify and describe a group of domains that have notably grown through the last decades and which we relate to the establishment of "oncopolicy" as a major concern in oncology.

摘要

本文通过科学映射对文献语料库研究做出了贡献。从文档及其文本维度的图形表示来看,随机块模型可以对文档和单词进行同时聚类,我们将其称为领域 - 主题模型。先前的工作研究了由此产生的主题或单词聚类,而我们的工作重点是对我们称为领域的文档聚类进行研究。为了能够描述和交互式导航领域,我们引入了考虑模型结构以关联这两种聚类类型的度量和接口。然后,我们提出了一种将块模型扩展到对文档的元数据属性进行聚类的过程,我们将其称为领域链模型,并指出我们的度量和接口可转换到元数据聚类。我们提供了一个与当前科学、技术和社会(STS)研究相关的语料库的示例应用,以及一个适合我们方法的有趣案例:1995年至2017年在美国临床肿瘤学会年会(主要的肿瘤学研究会议)上发表的摘要。通过一系列领域 - 主题模型和领域链模型,我们识别并描述了一组在过去几十年中显著发展的领域,并且我们将其与“肿瘤政策”作为肿瘤学中的一个主要关注点的确立联系起来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b793/9299004/6335bc7f1b58/ASI-73-992-g008.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验