• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用文本挖掘促进代谢组学技术受控词汇表的发展。

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

作者信息

Spasić Irena, Schober Daniel, Sansone Susanna-Assunta, Rebholz-Schuhmann Dietrich, Kell Douglas B, Paton Norman W

机构信息

Manchester Centre for Integrative Systems Biology, The University of Manchester, 131 Princess Street, Manchester, M1 7ND, UK.

出版信息

BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-9-S5-S5.

DOI:10.1186/1471-2105-9-S5-S5
PMID:18460187
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2367623/
Abstract

BACKGROUND

Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually.

RESULTS

We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts.

CONCLUSIONS

We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.

摘要

背景

许多生物信息学应用依赖于受控词汇表或本体,以便一致地解释和无缝整合分散在公共资源中的信息。代谢组学研究的实验数据集不仅需要相互整合,还需要按照系统生物学的理念与其他类型的组学研究产生的数据进行整合,因此代谢组学迫切需要词汇表和本体。然而,手动构建这些资源既耗时又不容易。

结果

我们描述了一种快速开发受控词汇表的方法,该研究最初是由描述代谢组学技术的词汇表需求所推动的。我们展示了两个案例研究,涉及两个受控词汇表(用于核磁共振光谱和气相色谱),它们的开发目前正在作为代谢组学标准倡议的一部分进行。初始词汇表是手动编制的,分别提供了243个和152个术语。从文献中自动获取了总共5699个和2612个新术语。结果分析表明,全文文章(尤其是材料与方法部分)是特定技术术语的主要来源,而不是论文摘要。

结论

我们建议采用一种基于语料库的高效术语获取文本挖掘方法,作为一种用科学文献中使用的术语快速扩展受控词汇表集的方法。我们采用了一种综合方法,结合相对通用的软件和数据资源,以经济高效的方式开发一个文本挖掘工具,用于跨领域扩展受控词汇表,作为手动术语收集和定制命名实体识别方法的实用替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/86fd93daa0b7/1471-2105-9-S5-S5-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/45ec2a9f1ceb/1471-2105-9-S5-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/d31234c3f86c/1471-2105-9-S5-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/baa66138e662/1471-2105-9-S5-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/6afd30b83c59/1471-2105-9-S5-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/e2538c94eca6/1471-2105-9-S5-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/4ef85bcd26d8/1471-2105-9-S5-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/3b2de4f3da65/1471-2105-9-S5-S5-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/48fd47aef6d1/1471-2105-9-S5-S5-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/86fd93daa0b7/1471-2105-9-S5-S5-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/45ec2a9f1ceb/1471-2105-9-S5-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/d31234c3f86c/1471-2105-9-S5-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/baa66138e662/1471-2105-9-S5-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/6afd30b83c59/1471-2105-9-S5-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/e2538c94eca6/1471-2105-9-S5-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/4ef85bcd26d8/1471-2105-9-S5-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/3b2de4f3da65/1471-2105-9-S5-S5-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/48fd47aef6d1/1471-2105-9-S5-S5-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddec/2367623/86fd93daa0b7/1471-2105-9-S5-S5-9.jpg

相似文献

1
Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.利用文本挖掘促进代谢组学技术受控词汇表的发展。
BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-9-S5-S5.
2
An entity tagger for recognizing acquired genomic variations in cancer literature.一种用于识别癌症文献中获得性基因组变异的实体标记器。
Bioinformatics. 2004 Nov 22;20(17):3249-51. doi: 10.1093/bioinformatics/bth350. Epub 2004 Jun 4.
3
Text processing through Web services: calling Whatizit.通过网络服务进行文本处理:调用Whatizit。
Bioinformatics. 2008 Jan 15;24(2):296-8. doi: 10.1093/bioinformatics/btm557. Epub 2007 Nov 15.
4
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。
Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.
5
Text mining and ontologies in biomedicine: making sense of raw text.生物医学中的文本挖掘与本体论:解读原始文本
Brief Bioinform. 2005 Sep;6(3):239-51. doi: 10.1093/bib/6.3.239.
6
Text-mining approach to evaluate terms for ontology development.文本挖掘方法评估本体开发的术语。
J Biomed Inform. 2009 Oct;42(5):824-30. doi: 10.1016/j.jbi.2009.03.009. Epub 2009 Mar 24.
7
Identification of key concepts in biomedical literature using a modified Markov heuristic.使用改进的马尔可夫启发式方法识别生物医学文献中的关键概念。
Bioinformatics. 2003 Feb 12;19(3):402-7. doi: 10.1093/bioinformatics/btg010.
8
Mapping biomedical vocabularies: a semi-automated term matching approach.映射生物医学词汇:一种半自动术语匹配方法。
Stud Health Technol Inform. 2014;202:16-9.
9
Composite annotations: requirements for mapping multiscale data and models to biomedical ontologies.复合注释:将多尺度数据和模型映射到生物医学本体的要求。
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:2791-4. doi: 10.1109/IEMBS.2009.5333830.
10
A method for verifying a vector-based text classification system.一种验证基于向量的文本分类系统的方法。
AMIA Annu Symp Proc. 2008 Nov 6:1030.

引用本文的文献

1
Cognitive analysis of metabolomics data for systems biology.用于系统生物学的代谢组学数据的认知分析。
Nat Protoc. 2021 Mar;16(3):1376-1418. doi: 10.1038/s41596-020-00455-4. Epub 2021 Jan 22.
2
Improved ontology for eukaryotic single-exon coding sequences in biological databases.改进生物数据库中真核生物单外显子编码序列的本体论。
Database (Oxford). 2018 Jan 1;2018:1-6. doi: 10.1093/database/bay089.
3
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.

本文引用的文献

1
MedEvi: retrieving textual evidence of relations between biomedical concepts from Medline.MedEvi:从医学在线数据库检索生物医学概念之间关系的文本证据。
Bioinformatics. 2008 Jun 1;24(11):1410-2. doi: 10.1093/bioinformatics/btn117. Epub 2008 Apr 9.
2
The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.OBO铸造厂:本体的协同进化以支持生物医学数据整合。
Nat Biotechnol. 2007 Nov;25(11):1251-5. doi: 10.1038/nbt1346.
3
The metabolomics standards initiative.代谢组学标准倡议
对医学术语进行排序以支持扩展用于患者理解电子健康记录笔记的通俗语言资源:适应性远程监督方法。
JMIR Med Inform. 2017 Oct 31;5(4):e42. doi: 10.2196/medinform.8531.
4
Evaluation and cross-comparison of lexical entities of biological interest (LexEBI).生物相关词汇实体的评估和交叉比较(LexEBI)。
PLoS One. 2013 Oct 4;8(10):e75185. doi: 10.1371/journal.pone.0075185. eCollection 2013.
5
How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience.如何将本体和蛋白质-蛋白质相互作用与文献联系起来:文本挖掘方法和 BioCreative 的经验。
Database (Oxford). 2012 Mar 21;2012:bas017. doi: 10.1093/database/bas017. Print 2012.
6
Computer-assisted update of a consumer health vocabulary through mining of social network data.通过挖掘社交网络数据对消费者健康词汇进行计算机辅助更新。
J Med Internet Res. 2011 May 17;13(2):e37. doi: 10.2196/jmir.1636.
7
EnvMine: a text-mining system for the automatic extraction of contextual information.EnvMine:一个文本挖掘系统,用于自动提取上下文信息。
BMC Bioinformatics. 2010 Jun 1;11:294. doi: 10.1186/1471-2105-11-294.
8
Reuse of terminological resources for efficient ontological engineering in Life Sciences.生命科学中术语资源的再利用对于有效的本体工程学至关重要。
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S4. doi: 10.1186/1471-2105-10-S10-S4.
9
Getting started in text mining: part two.文本挖掘入门:第二部分。
PLoS Comput Biol. 2009 Jul;5(7):e1000411. doi: 10.1371/journal.pcbi.1000411. Epub 2009 Jul 31.
10
Techniques for integrating -omics data.整合组学数据的技术。
Bioinformation. 2009;3(6):284-6. doi: 10.6026/97320630003284. Epub 2009 Jan 12.
Nat Biotechnol. 2007 Aug;25(8):846-8. doi: 10.1038/nbt0807-846b.
4
The practical impact of ontologies on biomedical informatics.本体论对生物医学信息学的实际影响。
Yearb Med Inform. 2006:124-35.
5
Development of FuGO: an ontology for functional genomics investigations.FuGO的开发:一种用于功能基因组学研究的本体论。
OMICS. 2006 Summer;10(2):199-204. doi: 10.1089/omi.2006.10.199.
6
National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge.国家生物医学本体中心:通过科学知识的结构化组织推动生物医学发展。
OMICS. 2006 Summer;10(2):185-98. doi: 10.1089/omi.2006.10.185.
7
The work of the Human Proteome Organisation's Proteomics Standards Initiative (HUPO PSI).人类蛋白质组组织蛋白质组学标准倡议(HUPO PSI)的工作。
OMICS. 2006 Summer;10(2):145-51. doi: 10.1089/omi.2006.10.145.
8
Bio-ontologies: current trends and future directions.生物本体论:当前趋势与未来方向。
Brief Bioinform. 2006 Sep;7(3):256-74. doi: 10.1093/bib/bbl027. Epub 2006 Aug 9.
9
Metabolomics technology and bioinformatics.代谢组学技术与生物信息学
Brief Bioinform. 2006 Jun;7(2):128-39. doi: 10.1093/bib/bbl012. Epub 2006 May 18.
10
Metabolomics Standards Workshop and the development of international standards for reporting metabolomics experimental results.代谢组学标准研讨会与代谢组学实验结果报告国际标准的制定。
Brief Bioinform. 2006 Jun;7(2):159-65. doi: 10.1093/bib/bbl008. Epub 2006 Apr 24.