School of Information, University of Arizona, Tucson, AZ 85705, USA.
Department of Biological sciences, University of Manitoba, Winnipeg, MB R3T 2N2, Canada.
Database (Oxford). 2020 Nov 20;2020. doi: 10.1093/database/baaa079.
To use published phenotype information in computational analyses, there have been efforts to convert descriptions of phenotype characters from human languages to ontologized statements. This postpublication curation process is not only slow and costly, it is also burdened with significant intercurator variation (including curator-author variation), due to different interpretations of a character by various individuals. This problem is inherent in any human-based intellectual activity. To address this problem, making scientific publications semantically clear (i.e. computable) by the authors at the time of publication is a critical step if we are to avoid postpublication curation. To help authors efficiently produce species phenotypes while producing computable data, we are experimenting with an author-driven ontology development approach and developing and evaluating a series of ontology-aware software modules that would create publishable species descriptions that are readily useable in scientific computations. The first software module prototype called Measurement Recorder has been developed to assist authors in defining continuous measurements and reported in this paper. Two usability studies of the software were conducted with 22 undergraduate students majoring in information science and 32 in biology. Results suggest that participants can use Measurement Recorder without training and they find it easy to use after limited practice. Participants also appreciate the semantic enhancement features. Measurement Recorder's character reuse features facilitate character convergence among participants by 48% and have the potential to further reduce user errors in defining characters. A set of software design issues have also been identified and then corrected. Measurement Recorder enables authors to record measurements in a semantically clear manner and enriches phenotype ontology along the way. Future work includes representing the semantic data as Resource Description Framework (RDF) knowledge graphs and characterizing the division of work between authors as domain knowledge providers and ontology engineers as knowledge formalizers in this new author-driven ontology development approach.
为了在计算分析中使用已发表的表型信息,已经有一些努力将人类语言中表型特征的描述转换为本体化的陈述。这个发表后的整理过程不仅缓慢且昂贵,而且由于不同的人对一个特征有不同的解释,因此还存在显著的整理者之间的差异(包括整理者与作者之间的差异)。这个问题是任何基于人类的智力活动所固有的。如果我们要避免发表后的整理,那么作者在发表时通过使科学出版物具有语义上的明晰性(即可计算性)是至关重要的一步。为了帮助作者在生成可计算数据的同时高效地生成物种表型,我们正在尝试采用作者驱动的本体开发方法,并开发和评估一系列本体感知软件模块,这些模块将创建可发表的物种描述,以便在科学计算中轻松使用。第一个软件模块原型称为Measurement Recorder,已被开发出来,用于协助作者定义连续测量,并在本文中进行了介绍。本文进行了两项关于该软件的可用性研究,共有 22 名主修信息科学的本科生和 32 名生物学专业的学生参与了研究。结果表明,参与者无需培训即可使用 Measurement Recorder,经过有限的练习后,他们会发现它使用起来很容易。参与者还赞赏其语义增强功能。Measurement Recorder 的特征重用功能可将参与者之间的特征收敛率提高 48%,并有可能进一步减少在定义特征时的用户错误。还确定了一组软件设计问题并进行了纠正。Measurement Recorder 使作者能够以语义清晰的方式记录测量值,并且在此过程中丰富了表型本体。未来的工作包括将语义数据表示为资源描述框架(RDF)知识图,并在新的作者驱动的本体开发方法中描述作者作为领域知识提供者和本体工程师之间的工作分工。