• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动化方法可实现对表型描述的直接计算以进行新型候选基因预测。

Automated Methods Enable Direct Computation on Phenotypic Descriptions for Novel Candidate Gene Prediction.

作者信息

Braun Ian R, Lawrence-Dill Carolyn J

机构信息

Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States.

Interdepartmental Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States.

出版信息

Front Plant Sci. 2020 Jan 10;10:1629. doi: 10.3389/fpls.2019.01629. eCollection 2019.

DOI:10.3389/fpls.2019.01629
PMID:31998331
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6965352/
Abstract

Natural language descriptions of plant phenotypes are a rich source of information for genetics and genomics research. We computationally translated descriptions of plant phenotypes into structured representations that can be analyzed to identify biologically meaningful associations. These representations include the entity-quality (EQ) formalism, which uses terms from biological ontologies to represent phenotypes in a standardized, semantically rich format, as well as numerical vector representations generated using natural language processing (NLP) methods (such as the bag-of-words approach and document embedding). We compared resulting phenotype similarity measures to those derived from manually curated data to determine the performance of each method. Computationally derived EQ and vector representations were comparably successful in recapitulating biological truth to representations created through manual EQ statement curation. Moreover, NLP methods for generating vector representations of phenotypes are scalable to large quantities of text because they require no human input. These results indicate that it is now possible to computationally and automatically produce and populate large-scale information resources that enable researchers to query phenotypic descriptions directly.

摘要

植物表型的自然语言描述是遗传学和基因组学研究的丰富信息来源。我们通过计算将植物表型描述转化为结构化表示,以便进行分析以识别具有生物学意义的关联。这些表示包括实体-质量(EQ)形式主义,它使用来自生物本体的术语以标准化、语义丰富的格式表示表型,以及使用自然语言处理(NLP)方法(如词袋法和文档嵌入)生成的数值向量表示。我们将所得的表型相似性度量与从人工整理数据得出的度量进行比较,以确定每种方法的性能。通过计算得出的EQ和向量表示在重现生物学真实性方面与通过人工EQ语句整理创建的表示相当成功。此外,用于生成表型向量表示的NLP方法可扩展到大量文本,因为它们不需要人工输入。这些结果表明,现在可以通过计算自动生成并填充大规模信息资源,使研究人员能够直接查询表型描述。

相似文献

1
Automated Methods Enable Direct Computation on Phenotypic Descriptions for Novel Candidate Gene Prediction.自动化方法可实现对表型描述的直接计算以进行新型候选基因预测。
Front Plant Sci. 2020 Jan 10;10:1629. doi: 10.3389/fpls.2019.01629. eCollection 2019.
2
Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature.进化特征、表型和本体论:从系统生物学文献中整理数据。
PLoS One. 2010 May 20;5(5):e10708. doi: 10.1371/journal.pone.0010708.
3
Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement.基于表型描述进行候选基因发现及作物改良的计算
Plant Phenomics. 2020 May 20;2020:1963251. doi: 10.34133/2020/1963251. eCollection 2020.
4
OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.OPA2Vec:结合生物医学本体的正式和非正式内容以改进基于相似度的预测。
Bioinformatics. 2019 Jun 1;35(12):2133-2140. doi: 10.1093/bioinformatics/bty933.
5
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.使用本体论对表型进行注释:自然语言处理系统的培训和评估的黄金标准。
Database (Oxford). 2018 Jan 1;2018:bay110. doi: 10.1093/database/bay110.
6
Linking human diseases to animal models using ontology-based phenotype annotation.利用基于本体的表型注释将人类疾病与动物模型联系起来。
PLoS Biol. 2009 Nov;7(11):e1000247. doi: 10.1371/journal.pbio.1000247. Epub 2009 Nov 24.
7
Automatically transforming pre- to post-composed phenotypes: EQ-lising HPO and MP.自动将预组合表型转换为后组合表型:使人类表型本体(HPO)和小鼠表型本体(MP)等效
J Biomed Semantics. 2013 Oct 16;4(1):29. doi: 10.1186/2041-1480-4-29.
8
PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing.PhenoGO:通过自然语言处理为基因本体注释赋予表型背景。
Pac Symp Biocomput. 2006:64-75.
9
Computable visually observed phenotype ontological framework for plants.可计算的植物可视表型本体框架。
BMC Bioinformatics. 2011 Jun 24;12:260. doi: 10.1186/1471-2105-12-260.
10
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

引用本文的文献

1
Blockchain-Empowered H-CPS Architecture for Smart Agriculture.用于智能农业的区块链赋能的人机协作物理系统(H-CPS)架构
Adv Sci (Weinh). 2025 Jul;12(27):e2503102. doi: 10.1002/advs.202503102. Epub 2025 Apr 25.
2
Genome-wide association studies from spoken phenotypic descriptions: a proof of concept from maize field studies.基于口语表型描述的全基因组关联研究:来自玉米田间研究的概念验证。
G3 (Bethesda). 2024 Sep 4;14(9). doi: 10.1093/g3journal/jkae161.
3
Wisconsin diversity panel phenotypes: spoken descriptions of plants and supporting data.

本文引用的文献

1
Comparative transcriptome analysis reveals differentially expressed genes related to the tissue-specific accumulation of anthocyanins in pericarp and aleurone layer for maize.比较转录组分析揭示了与玉米果皮和糊粉层组织特异性积累花青苷相关的差异表达基因。
Sci Rep. 2019 Feb 21;9(1):2485. doi: 10.1038/s41598-018-37697-y.
2
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.使用本体论对表型进行注释:自然语言处理系统的培训和评估的黄金标准。
Database (Oxford). 2018 Jan 1;2018:bay110. doi: 10.1093/database/bay110.
3
A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.
威斯康星州多样性小组表型:植物的口语描述及支持数据。
BMC Res Notes. 2024 Jan 23;17(1):33. doi: 10.1186/s13104-024-06694-y.
4
Toward a data infrastructure for the Plant Cell Atlas.迈向植物细胞图谱的数据基础设施。
Plant Physiol. 2023 Jan 2;191(1):35-46. doi: 10.1093/plphys/kiac468.
5
Vision, challenges and opportunities for a Plant Cell Atlas.植物细胞图谱的愿景、挑战与机遇
Elife. 2021 Sep 7;10:e66877. doi: 10.7554/eLife.66877.
6
Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement.基于表型描述进行候选基因发现及作物改良的计算
Plant Phenomics. 2020 May 20;2020:1963251. doi: 10.34133/2020/1963251. eCollection 2020.
使用表示学习方法从生物医学文献中提取基因-表型关系的管道。
Bioinformatics. 2018 Jul 1;34(13):i386-i394. doi: 10.1093/bioinformatics/bty263.
4
PlantCV v2: Image analysis software for high-throughput plant phenotyping.PlantCV v2:用于高通量植物表型分析的图像分析软件。
PeerJ. 2017 Dec 1;5:e4088. doi: 10.7717/peerj.4088. eCollection 2017.
5
Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants.植物中代谢酶、代谢途径和基因簇的全基因组预测
Plant Physiol. 2017 Apr;173(4):2041-2059. doi: 10.1104/pp.16.01942. Epub 2017 Feb 22.
6
A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images.一种从图像中自动计算玉米穗、穗轴和籽粒属性的强大的高通量方法。
Plant J. 2017 Jan;89(1):169-178. doi: 10.1111/tpj.13320. Epub 2016 Nov 19.
7
Standardized benchmarking in the quest for orthologs.寻找直系同源基因过程中的标准化基准测试。
Nat Methods. 2016 May;13(5):425-30. doi: 10.1038/nmeth.3830. Epub 2016 Apr 4.
8
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.NOBLE——用于大规模生物医学自然语言处理的灵活概念识别
BMC Bioinformatics. 2016 Jan 14;17:32. doi: 10.1186/s12859-015-0871-y.
9
PhenoMiner: from text to a database of phenotypes associated with OMIM diseases.PhenoMiner:从文本到与《在线人类孟德尔遗传》疾病相关的表型数据库
Database (Oxford). 2015 Oct 27;2015. doi: 10.1093/database/bav104. Print 2015.
10
GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.GNormPlus:一种用于标记基因、基因家族和蛋白质结构域的综合方法。
Biomed Res Int. 2015;2015:918710. doi: 10.1155/2015/918710. Epub 2015 Aug 25.