Suppr超能文献

用于频率、确定性、程度和覆盖表型修饰符的修饰符本体。

Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier.

作者信息

Endara Lorena, Thessen Anne E, Cole Heather A, Walls Ramona, Gkoutos Georgios, Cao Yujie, Chong Steven S, Cui Hong

机构信息

University of Florida, Gainesville, United States of America University of Florida Gainesville United States of America.

The Ronin Institute for Independent Scholarship, Monclair, NJ, United States of America The Ronin Institute for Independent Scholarship Monclair, NJ United States of America.

出版信息

Biodivers Data J. 2018 Nov 28(6):e29232. doi: 10.3897/BDJ.6.e29232. eCollection 2018.

Abstract

When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called "modifiers". With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using "broader synonym" or "not recommended" annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.

摘要

当文献中描述表型特征时,可能会用诸如表达位置或程度等附加信息来加以限制或阐明,这些术语被称为“修饰词”。随着将叙述性特征描述转化为可计算数据的工作正在进行,需要此类修饰词的本体。这样的本体也可用于指导未来出版物中的术语使用。空间和方法修饰词是已经开发或正在开发的本体的主题。在这项工作中,收集、审查了频率(如很少、通常)、确定性(如可能、肯定)、程度(如轻微、极其)和覆盖修饰词(如稀疏地、完全地),并用于创建两个具有不同设计考量的修饰词本体。基本目标是表达一类修饰词中的顺序关系,例如,“通常”比“很少”更频繁,以便使使用本体术语注释的数据能够据此分类。针对本体提出了两种设计,均采用列表模式:一个封闭有序列表(即五分类设计)和一个开放有序列表设计。五分类设计将修饰词术语放入一组5个固定的类别中,具有区间对象属性,例如,one_level_more/less_frequently_than,其中新术语只能作为现有类别的同义词添加。开放列表方法从5个类别开始,但通过序数属性支持列表的可扩展性,例如,more/less_frequently_than,允许新术语作为新类别插入列表中的任何位置。本文讨论了不同设计决策的后果。使用CharaParser从植物、蚂蚁和其他分类描述中提取修饰词。经过人工筛选,选择了130个修饰词作为修饰词本体的候选术语。四位策展人/专家(三位生物学家和一位专门研究生物语义学信息科学家)使用本体术语组织器(OTO)(http://biosemantics.arizona.edu/OTO)对这些术语进行审查,并将其分类为20个类别。审查了策展人之间的差异,并在最终本体中体现出来。所有策展人完全一致的频率、确定性、程度和覆盖术语被用作类标签或精确同义词。具有不同解释的术语要么被排除,要么使用“更宽泛的同义词”或“不推荐”注释属性纳入。这些注释明确让用户了解与这些术语相关的语义模糊性,以及是否应谨慎使用或避免使用。专家分类结果表明,20个类别中有16个包含完全一致的术语,这表明将修饰词分为5个级别/类别在区分修饰词的需求与本体反映用户共识的需求之间取得了平衡。使用Protege本体编辑器开发并以OWL文件形式提供了两个本体,可从https://github.com/biosemantics/ontologies下载。我们遵循基于共识的方法,使用分类文献中常用的术语构建了前两个修饰词本体。五分类本体已用于分类概念浏览器网络工具包中,以计算从文献中提取的特征之间的相似度,以促进分类概念对齐。这两个本体还将用于为分类学家提供的本体辅助创作工具中,以促进修饰词术语使用的一致性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de78/6281706/8738db59e530/bdj-06-e29232-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验