• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

激励在生物学描述中使用结构化语言:作者驱动的表型数据与本体生成。

Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production.

作者信息

Cui Hong, Macklin James A, Sachs Joel, Reznicek Anton, Starr Julian, Ford Bruce, Penev Lyubomir, Chen Hsin-Liang

机构信息

University of Arizona, TUCSON, United States of America University of Arizona TUCSON United States of America.

Agriculture and Agri-Food Canada, Ottawa, Canada Agriculture and Agri-Food Canada Ottawa Canada.

出版信息

Biodivers Data J. 2018 Nov 7(6):e29616. doi: 10.3897/BDJ.6.e29616. eCollection 2018.

DOI:10.3897/BDJ.6.e29616
PMID:30473620
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6235995/
Abstract

Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production.

摘要

表型可用于多种目的,如定义物种、重建系统发育、诊断疾病或提高作物和动物的生产力,但这些表型数据大多以不可计算的自由文本叙述形式发表。这意味着基因组、环境和表型之间的复杂关系在很大程度上无法进行分析,与生物体进化、疾病或其对气候变化的反应相关的重要问题也无法得到充分解决。在将自由文本叙述手动转换为可计算格式以便用于大规模分析之前,需要付出巨大努力。我们认为,这种手动编目方法不是生成可计算表型数据的可持续解决方案,原因有三点:1)它无法扩展到所有生物多样性;2)它无法阻止自由文本表型的发表,而这些表型未来仍将需要手动编目,最重要的是,3)它没有解决编目人员之间的差异问题(编目人员对表型的解释/转换各不相同)。我们的实证研究表明,即使在单个项目中,编目人员之间的差异也高达40%。在这种差异水平下,很难想象从多个编目项目整合的数据会具有高质量。这种差异的主要原因已被确定为原始表型描述中的语义模糊以及使用标准化词汇(本体)的困难。我们认为,描述表型的作者是解决方案的关键。有了合适的工具和适当的归属,作者应该负责开发项目的语义和本体。这将加快本体开发,并从发表之时起提高表型描述的语义清晰度。2017年7月,美国国家科学基金会(NSF)的农业和食品信息学(ABI)项目资助了一个关于这一想法的概念验证项目。我们寻求读者对所提出方法的意见或批评,以帮助在不久的将来实现基于社区的可计算表型数据生产。该项目的结果可通过https://biosemantics.github.io/author-driven-production获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/6235995/782429751174/bdj-06-e29616-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/6235995/587082bd9187/bdj-06-e29616-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/6235995/782429751174/bdj-06-e29616-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/6235995/587082bd9187/bdj-06-e29616-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/6235995/782429751174/bdj-06-e29616-g002.jpg

相似文献

1
Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production.激励在生物学描述中使用结构化语言:作者驱动的表型数据与本体生成。
Biodivers Data J. 2018 Nov 7(6):e29616. doi: 10.3897/BDJ.6.e29616. eCollection 2018.
2
Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes.记录器:开发一种有用的工具来进行物种描述,生成可计算的表型。
Database (Oxford). 2020 Nov 20;2020. doi: 10.1093/database/baaa079.
3
Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature.进化特征、表型和本体论:从系统生物学文献中整理数据。
PLoS One. 2010 May 20;5(5):e10708. doi: 10.1371/journal.pone.0010708.
4
Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier.用于频率、确定性、程度和覆盖表型修饰符的修饰符本体。
Biodivers Data J. 2018 Nov 28(6):e29232. doi: 10.3897/BDJ.6.e29232. eCollection 2018.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.使用本体论对表型进行注释:自然语言处理系统的培训和评估的黄金标准。
Database (Oxford). 2018 Jan 1;2018:bay110. doi: 10.1093/database/bay110.
7
Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy.移山:对将比较解剖学转化为可计算解剖学所需努力的分析。
Database (Oxford). 2015 May 13;2015:bav040. doi: 10.1093/database/bav040. Print 2015.
8
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
9
Computable visually observed phenotype ontological framework for plants.可计算的植物可视表型本体框架。
BMC Bioinformatics. 2011 Jun 24;12:260. doi: 10.1186/1471-2105-12-260.
10
Authors' attitude toward adopting a new workflow to improve the computability of phenotype publications.作者对采用新工作流程以提高表型出版物可计算性的态度。
Database (Oxford). 2022 Feb 2;2022. doi: 10.1093/database/baac001.

引用本文的文献

1
Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.帮助作者生成可实现公平原则的分类学数据:对作者驱动的表型数据生成原型的评估
Database (Oxford). 2025 Jan 29;2025. doi: 10.1093/database/baae097.
2
Authors' attitude toward adopting a new workflow to improve the computability of phenotype publications.作者对采用新工作流程以提高表型出版物可计算性的态度。
Database (Oxford). 2022 Feb 2;2022. doi: 10.1093/database/baac001.
3
Which methods are the most effective in enabling novice users to participate in ontology creation? A usability study.

本文引用的文献

1
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.使用本体论对表型进行注释:自然语言处理系统的培训和评估的黄金标准。
Database (Oxford). 2018 Jan 1;2018:bay110. doi: 10.1093/database/bay110.
2
Incorporating Data Citation in a Biomedical Repository: An Implementation Use Case.在生物医学知识库中纳入数据引用:一个实施用例。
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:131-138. eCollection 2017.
3
Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building.
哪些方法对于帮助新手用户参与本体创建最为有效?一项可用性研究。
Database (Oxford). 2021 Jun 22;2021. doi: 10.1093/database/baab035.
4
Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes.记录器:开发一种有用的工具来进行物种描述,生成可计算的表型。
Database (Oxford). 2020 Nov 20;2020. doi: 10.1093/database/baaa079.
通过蜘蛛测量矩阵构建的案例研究介绍分类单元概念探索器。
BMC Bioinformatics. 2016 Nov 17;17(1):471. doi: 10.1186/s12859-016-1352-7.
4
Simple export of journal citation data to Excel using any reference manager.使用任何参考文献管理软件将期刊引用数据简单导出到Excel。
J Med Libr Assoc. 2016 Jan;104(1):72-5. doi: 10.3163/1536-5050.104.1.012.
5
Phenoscape: Identifying Candidate Genes for Evolutionary Phenotypes.表型景观:识别进化表型的候选基因。
Mol Biol Evol. 2016 Jan;33(1):13-24. doi: 10.1093/molbev/msv223. Epub 2015 Oct 24.
6
Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy.移山:对将比较解剖学转化为可计算解剖学所需努力的分析。
Database (Oxford). 2015 May 13;2015:bav040. doi: 10.1093/database/bav040. Print 2015.
7
OTO: Ontology Term Organizer.OTO:本体术语组织器。
BMC Bioinformatics. 2015 Feb 15;16(1):47. doi: 10.1186/s12859-015-0488-1.
8
Reasoning over taxonomic change: exploring alignments for the Perelleschus use case.关于分类学变化的推理:探索Perelleschus用例的对齐方式。
PLoS One. 2015 Feb 20;10(2):e0118247. doi: 10.1371/journal.pone.0118247. eCollection 2015.
9
Three keys to the radiation of angiosperms into freezing environments.被子植物辐射到冰冻环境中的三个关键。
Nature. 2014 Feb 6;506(7486):89-92. doi: 10.1038/nature12872. Epub 2013 Dec 22.
10
Data reuse and the open data citation advantage.数据重用与开放数据引文优势。
PeerJ. 2013 Oct 1;1:e175. doi: 10.7717/peerj.175. eCollection 2013.