• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EnvMine:一个文本挖掘系统,用于自动提取上下文信息。

EnvMine: a text-mining system for the automatic extraction of contextual information.

机构信息

Centro Nacional de Biotecnología (CNB), CSIC, C/Darwin 3, 28049 Madrid, Spain.

出版信息

BMC Bioinformatics. 2010 Jun 1;11:294. doi: 10.1186/1471-2105-11-294.

DOI:10.1186/1471-2105-11-294
PMID:20515448
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2901371/
Abstract

BACKGROUND

For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind.

RESULTS

EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings.Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations.

CONCLUSION

EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.

摘要

背景

对于生态研究,关键是要充分描述所研究的环境和样本。这种描述必须根据其物理化学特性进行,否则很难在不同环境之间进行直接比较。此外,特征描述还必须包括精确的地理位置,以便能够研究地理分布和生物地理模式。目前,还没有用于标注这些环境特征的模式,这些数据必须从文本来源(已发表的文章)中提取。到目前为止,这必须通过手动检查相应的文件来完成。为了方便这项任务,我们开发了 EnvMine,这是一组文本挖掘工具,用于从任何类型的文本来源中检索上下文信息(物理化学变量和地理位置)。

结果

EnvMine 能够通过准确识别其相关的度量单位,检索文本中引用的物理化学变量。在这项任务中,系统的召回率(检索项的百分比)达到 92%,错误率不到 1%。还测试了贝叶斯分类器,用于区分描述环境特征的文本部分和描述实验设置等的文本部分。关于地理位置的识别,系统利用现有数据库(如 GeoNames)实现 86%的召回率和 92%的精度。位置的识别还包括确定其确切坐标(纬度和经度),从而允许计算各个位置之间的距离。

结论

EnvMine 是一种从不同文本来源(如已发表的文章或网页)中提取上下文信息的非常有效的方法。该工具可以帮助确定采样点的确切位置和物理化学变量,从而便于进行生态分析。EnvMine 还可以帮助制定环境特征标注的标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0329/2901371/fd8ddcb28093/1471-2105-11-294-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0329/2901371/00284d86ffc5/1471-2105-11-294-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0329/2901371/fd8ddcb28093/1471-2105-11-294-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0329/2901371/00284d86ffc5/1471-2105-11-294-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0329/2901371/fd8ddcb28093/1471-2105-11-294-2.jpg

相似文献

1
EnvMine: a text-mining system for the automatic extraction of contextual information.EnvMine:一个文本挖掘系统,用于自动提取上下文信息。
BMC Bioinformatics. 2010 Jun 1;11:294. doi: 10.1186/1471-2105-11-294.
2
Evaluation of BioCreAtIvE assessment of task 2.生物创意任务2评估的评价
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.
3
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意(BioCreAtIvE)和基因本体注释(GOA)的基因本体(GO)注释检索的评估。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.
4
Figure text extraction in biomedical literature.生物医学文献中的图表文本提取。
PLoS One. 2011 Jan 13;6(1):e15338. doi: 10.1371/journal.pone.0015338.
5
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text.面向生物医学文本中以疾病为中心的关系提取的协作文本标注资源。
J Biomed Inform. 2009 Oct;42(5):967-77. doi: 10.1016/j.jbi.2009.02.001. Epub 2009 Feb 14.
6
METSP: a maximum-entropy classifier based text mining tool for transporter-substrate identification with semistructured text.METSP:一种基于最大熵分类器的文本挖掘工具,用于通过半结构化文本识别转运蛋白-底物。
Biomed Res Int. 2015;2015:254838. doi: 10.1155/2015/254838. Epub 2015 Oct 1.
7
Textual and visual content-based anti-phishing: a Bayesian approach.基于文本和视觉内容的反网络钓鱼:一种贝叶斯方法。
IEEE Trans Neural Netw. 2011 Oct;22(10):1532-46. doi: 10.1109/TNN.2011.2161999. Epub 2011 Aug 4.
8
A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.一种从科学文献中识别和提取核酸序列的知识工程方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:1081-4. doi: 10.1109/IEMBS.2010.5627316.
9
Annotating images by mining image search results.通过挖掘图像搜索结果来标注图像。
IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1919-32. doi: 10.1109/TPAMI.2008.127.
10
Text mining and ontologies in biomedicine: making sense of raw text.生物医学中的文本挖掘与本体论:解读原始文本
Brief Bioinform. 2005 Sep;6(3):239-51. doi: 10.1093/bib/6.3.239.

引用本文的文献

1
Extracting and modeling geographic information from scientific articles.从科学文章中提取和建模地理信息。
PLoS One. 2021 Jan 6;16(1):e0244918. doi: 10.1371/journal.pone.0244918. eCollection 2021.
2
Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature.用于生物医学文献中地理位置提取的双向递归神经网络模型
Pac Symp Biocomput. 2019;24:100-111.
3
Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research.在GenBank中进行地理空间和宿主元数据的命名实体链接以推进生物医学研究。

本文引用的文献

1
EnvDB, a database for describing the environmental distribution of prokaryotic taxa.EnvDB,一个用于描述原核生物分类单元环境分布的数据库。
Environ Microbiol Rep. 2009 Jun;1(3):191-7. doi: 10.1111/j.1758-2229.2009.00030.x. Epub 2009 Apr 29.
2
Linking genes to literature: text mining, information extraction, and retrieval applications for biology.将基因与文献相联系:生物学的文本挖掘、信息提取及检索应用
Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.
3
A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).
Database (Oxford). 2017 Jan 1;2017:bax093. doi: 10.1093/database/bax093.
4
Deep neural networks and distant supervision for geographic location mention extraction.深度神经网络和远程监督在地理位置提及提取中的应用。
Bioinformatics. 2018 Jul 1;34(13):i565-i573. doi: 10.1093/bioinformatics/bty273.
5
A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.一种用于扩展GenBank记录中地理空间元数据的基于规则的高精度提取系统。
J Am Med Inform Assoc. 2016 Sep;23(5):934-41. doi: 10.1093/jamia/ocv172. Epub 2016 Jan 17.
6
Knowledge-driven geospatial location resolution for phylogeographic models of virus migration.用于病毒迁移系统发育地理学模型的知识驱动型地理空间定位解析
Bioinformatics. 2015 Jun 15;31(12):i348-56. doi: 10.1093/bioinformatics/btv259.
7
BioNLP Shared Task--The Bacteria Track.生物自然语言处理共享任务——细菌专题。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-13-S11-S3.
一种符合标准MIGS/MIMS的XML模式:迈向基因组上下文数据标记语言(GCDML)的发展。
OMICS. 2008 Jun;12(2):115-21. doi: 10.1089/omi.2008.0A10.
4
Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.利用文本挖掘促进代谢组学技术受控词汇表的发展。
BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-9-S5-S5.
5
Habitat-Lite: a GSC case study based on free text terms for environmental metadata.Habitat-Lite:一个基于环境元数据自由文本术语的地球科学委员会案例研究。
OMICS. 2008 Jun;12(2):129-36. doi: 10.1089/omi.2008.0016.
6
Analysis of bacterial bowel communities of IBD patients: what has it revealed?炎症性肠病患者肠道细菌群落分析:揭示了什么?
Inflamm Bowel Dis. 2008 Jun;14(6):858-67. doi: 10.1002/ibd.20392.
7
Diversity of the human gastrointestinal tract microbiota revisited.重新审视人类胃肠道微生物群的多样性。
Environ Microbiol. 2007 Sep;9(9):2125-36. doi: 10.1111/j.1462-2920.2007.01369.x.
8
Global patterns in bacterial diversity.细菌多样性的全球模式。
Proc Natl Acad Sci U S A. 2007 Jul 3;104(27):11436-40. doi: 10.1073/pnas.0611525104. Epub 2007 Jun 25.
9
Global patterns of diversity and community structure in marine bacterioplankton.海洋浮游细菌多样性和群落结构的全球模式
Mol Ecol. 2007 Feb;16(4):867-80. doi: 10.1111/j.1365-294X.2006.03189.x.
10
Text mining and its potential applications in systems biology.文本挖掘及其在系统生物学中的潜在应用。
Trends Biotechnol. 2006 Dec;24(12):571-9. doi: 10.1016/j.tibtech.2006.10.002. Epub 2006 Oct 12.