• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

《科学哲学的秘诀:通过主题关联规则刻画语料库的语义结构》

The recipes of Philosophy of Science: Characterizing the semantic structure of corpora by means of topic associative rules.

作者信息

Malaterre Christophe, Chartier Jean-François, Lareau Francis

机构信息

Département de philosophie, Université du Québec à Montréal (UQAM), Montréal, Québec, Canada.

Centre interuniversitaire de recherche sur la science et la technologie (CIRST), Montréal, Québec, Canada.

出版信息

PLoS One. 2020 Nov 18;15(11):e0242353. doi: 10.1371/journal.pone.0242353. eCollection 2020.

DOI:10.1371/journal.pone.0242353
PMID:33206699
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7673543/
Abstract

Scientific articles have semantic contents that are usually quite specific to their disciplinary origins. To characterize such semantic contents, topic-modeling algorithms make it possible to identify topics that run throughout corpora. However, they remain limited when it comes to investigating the extent to which topics are jointly used together in specific documents and form particular associative patterns. Here, we propose to characterize such patterns through the identification of "topic associative rules" that describe how topics are associated within given sets of documents. As a case study, we use a corpus from a subfield of the humanities-the philosophy of science-consisting of the complete full-text content of one of its main journals: Philosophy of Science. On the basis of a pre-existing topic modeling, we develop a methodology with which we infer a set of 96 topic associative rules that characterize specific types of articles depending on how these articles combine topics in peculiar patterns. Such rules offer a finer-grained window onto the semantic content of the corpus and can be interpreted as "topical recipes" for distinct types of philosophy of science articles. Examining rule networks and rule predictive success for different article types, we find a positive correlation between topological features of rule networks (connectivity) and the reliability of rule predictions (as summarized by the F-measure). Topic associative rules thereby not only contribute to characterizing the semantic contents of corpora at a finer granularity than topic modeling, but may also help to classify documents or identify document types, for instance to improve natural language generation processes.

摘要

科学文章具有语义内容,这些内容通常与其学科起源密切相关。为了描述此类语义内容,主题建模算法能够识别贯穿语料库的主题。然而,在研究特定文档中主题共同使用的程度以及形成特定关联模式方面,它们仍然存在局限性。在此,我们建议通过识别“主题关联规则”来描述此类模式,这些规则描述了主题在给定文档集中是如何关联的。作为一个案例研究,我们使用了来自人文学科一个子领域——科学哲学——的语料库,该语料库由其主要期刊之一《科学哲学》的完整全文内容组成。基于预先存在的主题建模,我们开发了一种方法,据此推断出一组96条主题关联规则,这些规则根据文章如何以独特模式组合主题来表征特定类型的文章。这些规则为语料库的语义内容提供了一个更细粒度的窗口,并且可以被解释为不同类型科学哲学文章的“主题配方”。通过检查不同文章类型的规则网络和规则预测成功率,我们发现规则网络的拓扑特征(连通性)与规则预测的可靠性(由F值总结)之间存在正相关。因此,主题关联规则不仅有助于以比主题建模更细的粒度描述语料库的语义内容,还可能有助于对文档进行分类或识别文档类型,例如改善自然语言生成过程。

相似文献

1
The recipes of Philosophy of Science: Characterizing the semantic structure of corpora by means of topic associative rules.《科学哲学的秘诀:通过主题关联规则刻画语料库的语义结构》
PLoS One. 2020 Nov 18;15(11):e0242353. doi: 10.1371/journal.pone.0242353. eCollection 2020.
2
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.通过多语料库训练实现用于药物不良反应检测的便携式自动文本分类
J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.
3
tESA: a distributional measure for calculating semantic relatedness.tESA:一种用于计算语义相关性的分布度量。
J Biomed Semantics. 2016 Dec 28;7(1):67. doi: 10.1186/s13326-016-0109-6.
4
A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
5
Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora.语境至关重要:从大规模文本语料库的机器学习分析中恢复人类语义结构。
Cogn Sci. 2022 Feb;46(2):e13085. doi: 10.1111/cogs.13085.
6
Domain adaptation for semantic role labeling of clinical text.临床文本语义角色标注的领域适应
J Am Med Inform Assoc. 2015 Sep;22(5):967-79. doi: 10.1093/jamia/ocu048. Epub 2015 Jun 10.
7
A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora.面向文本语料概念化的概念驱动生物医学知识提取和可视化框架。
J Biomed Inform. 2010 Dec;43(6):1020-35. doi: 10.1016/j.jbi.2010.09.008. Epub 2010 Sep 24.
8
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
9
Desiderata for ontologies to be used in semantic annotation of biomedical documents.用于生物医学文献语义标注的本体的需求。
J Biomed Inform. 2011 Feb;44(1):94-101. doi: 10.1016/j.jbi.2010.10.002. Epub 2010 Oct 26.
10
An ontology-based similarity measure for biomedical data-application to radiology reports.基于本体的生物医学数据相似度测量-在放射学报告中的应用。
J Biomed Inform. 2013 Oct;46(5):857-68. doi: 10.1016/j.jbi.2013.06.013. Epub 2013 Jul 11.

本文引用的文献

1
Inferring characteristic phenotypes via class association rule mining in the bone dysplasia domain.通过在骨发育不良领域的类关联规则挖掘推断特征表型。
J Biomed Inform. 2014 Apr;48:73-83. doi: 10.1016/j.jbi.2013.12.001. Epub 2013 Dec 10.
2
Cross-Ontology multi-level association rule mining in the Gene Ontology.在本体论中进行跨本体多层次关联规则挖掘。
PLoS One. 2012;7(10):e47411. doi: 10.1371/journal.pone.0047411. Epub 2012 Oct 12.
3
Finding scientific topics.寻找科学主题。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.