Chakrabarti Chayan, Jones Thomas B, Luger George F, Xu Jiawei F, Turner Matthew D, Laird Angela R, Turner Jessica A
Department of Computer Science, University of New Mexico, Albuquerque, New Mexico, USA.
Department of Computer Science, University of New Mexico, Albuquerque, New Mexico, USA ; Mind Research Network, Albuquerque, New Mexico, USA.
J Biomed Semantics. 2014 Jun 3;5(Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G):S2. doi: 10.1186/2041-1480-5-S1-S2. eCollection 2014.
Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology's structure.
We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge.
We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score.
Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations.
本体以强大的数据结构对一个领域内的关系进行编码,这些数据结构可用于以简化搜索和元分析等任务的方式注释包括科学论文在内的数据对象。然而,人工进行注释过程需要大量时间和精力。文本挖掘算法可以促进这一过程,但它们主要基于关键词、同义词和语义匹配进行分析。它们没有利用本体结构中嵌入的信息。
我们提出了一个概率框架,通过间接对本体中不同类之间的限制进行建模,促进文献的自动注释。我们的研究重点是在认知范式本体(CogPO)中注释人类功能性神经影像学文献。我们使用一种将朴素贝叶斯的随机简单性与决策树的形式透明度相结合的方法。我们的数据结构易于修改以反映不断变化的领域知识。
我们根据F1微观分数的质量度量,在朴素贝叶斯、贝叶斯决策树和让人类专家参与其中的约束决策树分类器之间比较了我们的结果。
与传统文本挖掘算法不同,我们的框架可以对本体中依赖关系编码的知识进行建模,尽管是间接的。我们成功利用了CogPO具有明确陈述的限制以及专家策划注释中模式形式的隐式依赖这一事实。