Suppr超能文献

形态描述的语义标注:总体策略。

Semantic annotation of morphological descriptions: an overall strategy.

机构信息

School of Information Resources and Library Science, University of Arizona, 1515 E, First Street, Tucson, Arizona 85719, USA.

出版信息

BMC Bioinformatics. 2010 May 25;11:278. doi: 10.1186/1471-2105-11-278.

Abstract

BACKGROUND

Large volumes of morphological descriptions of whole organisms have been created as print or electronic text in a human-readable format. Converting the descriptions into computer- readable formats gives a new life to the valuable knowledge on biodiversity. Research in this area started 20 years ago, yet not sufficient progress has been made to produce an automated system that requires only minimal human intervention but works on descriptions of various plant and animal groups. This paper attempts to examine the hindering factors by identifying the mismatches between existing research and the characteristics of morphological descriptions.

RESULTS

This paper reviews the techniques that have been used for automated annotation, reports exploratory results on characteristics of morphological descriptions as a genre, and identifies challenges facing automated annotation systems. Based on these criteria, the paper proposes an overall strategy for converting descriptions of various taxon groups with the least human effort.

CONCLUSIONS

A combined unsupervised and supervised machine learning strategy is needed to construct domain ontologies and lexicons and to ultimately achieve automated semantic annotation of morphological descriptions. Further, we suggest that each effort in creating a new description or annotating an individual description collection should be shared and contribute to the "biodiversity information commons" for the Semantic Web. This cannot be done without a sound strategy and a close partnership between and among information scientists and biologists.

摘要

背景

大量的生物体形态描述已以可读的印刷或电子文本形式创建。将这些描述转换为计算机可读的格式,可以为生物多样性的宝贵知识赋予新的生命。该领域的研究始于 20 年前,但尚未取得足够的进展,无法开发出仅需最小人工干预即可工作的自动化系统,但可以应用于各种植物和动物群体的描述。本文试图通过识别现有研究与形态描述特征之间的不匹配来检查阻碍因素。

结果

本文综述了用于自动注释的技术,报告了形态描述作为一种体裁的特征的探索性结果,并确定了自动注释系统面临的挑战。基于这些标准,本文提出了一种总体策略,以最少的人工努力将各种分类群的描述进行转换。

结论

需要结合无监督和监督机器学习策略来构建领域本体和词汇表,并最终实现形态描述的自动语义注释。此外,我们建议,在创建新描述或注释单个描述集合时,每项工作都应共享并为语义网的“生物多样性信息共享”做出贡献。如果没有合理的策略以及信息科学家和生物学家之间的密切合作,这是不可能实现的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043b/2887808/7f229ae62dc9/1471-2105-11-278-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验