• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从文献中进行多层次基因标准化的大规模事件提取。

Large-scale event extraction from literature with multi-level gene normalization.

机构信息

Department of Plant Systems Biology, VIB, Gent, Belgium.

出版信息

PLoS One. 2013 Apr 17;8(4):e55814. doi: 10.1371/journal.pone.0055814. Print 2013.

DOI:10.1371/journal.pone.0055814
PMID:23613707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3629104/
Abstract

Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons - Attribution - Share Alike (CC BY-SA) license.

摘要

文本挖掘旨在通过自动处理生物医学文本来辅助数据库管理、知识总结和信息检索。为了提供全面的覆盖范围并能够与 UniProt、KEGG、BioGRID 和 NCBI 等资源中记录的信息完全集成,文本挖掘工具必须能够扩展到数百万篇文章,并且其分析可以明确链接到这些资源中的信息。在这项研究中,我们研究了如何通过一种标准化策略来增强对复杂生物分子事件的完全自动化文本挖掘,该策略可以识别文本中的生物概念,并将它们映射到不同粒度级别的标识符,从规范符号到唯一基因和蛋白质以及广泛的基因家族。为此,我们结合了两个最先进的文本挖掘组件,这两个组件之前在两个社区范围内的挑战中进行了评估,并通过利用它们的互补性对这些方法进行了扩展和改进。使用这些系统,我们执行标准化和事件提取,创建一个大规模资源,该资源是公开的、语义范围独特的,涵盖了所有 2190 万篇 PubMed 摘要和 46 万篇 PubMed Central 开放获取全文文章。这个数据集包含 4 亿个涉及 7600 万个基因/蛋白质提及的生物分子事件,链接到来自 5032 个物种的 122000 个不同基因,这些物种跨越整个分类树。详细的评估和分析显示,该数据在数据库和途径管理工作中的应用具有很大的潜力。本研究中使用的主要软件组件是在开源许可证下发布的。此外,通过一个新的 API 可以免费访问生成的数据集,提供编程和定制访问(http://www.evexdb.org/api/v001/)。最后,为了允许进行大规模的生物信息学分析,整个资源可从 http://evexdb.org/download/ 批量下载,采用的是知识共享署名-相同方式共享(CC BY-SA)许可。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/00589980bf7d/pone.0055814.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/8ad5168f73f8/pone.0055814.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/f001564fb9c9/pone.0055814.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/b538e14012dd/pone.0055814.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/a09678db6f1b/pone.0055814.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/00589980bf7d/pone.0055814.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/8ad5168f73f8/pone.0055814.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/f001564fb9c9/pone.0055814.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/b538e14012dd/pone.0055814.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/a09678db6f1b/pone.0055814.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0015/3629104/00589980bf7d/pone.0055814.g005.jpg

相似文献

1
Large-scale event extraction from literature with multi-level gene normalization.从文献中进行多层次基因标准化的大规模事件提取。
PLoS One. 2013 Apr 17;8(4):e55814. doi: 10.1371/journal.pone.0055814. Print 2013.
2
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.Textpresso 中心:一个可定制的平台,用于搜索、文本挖掘、查看和管理生物医学文献。
BMC Bioinformatics. 2018 Mar 9;19(1):94. doi: 10.1186/s12859-018-2103-8.
3
BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events.BioContext:一个用于大规模提取和语境化生物分子事件的集成文本挖掘系统。
Bioinformatics. 2012 Aug 15;28(16):2154-61. doi: 10.1093/bioinformatics/bts332. Epub 2012 Jun 17.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.FamPlex:生物医学文本挖掘中人类蛋白质家族和复合物的实体识别和关系解析资源。
BMC Bioinformatics. 2018 Jun 28;19(1):248. doi: 10.1186/s12859-018-2211-5.
6
iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature.iTextMine:用于从文献中大规模知识提取的集成文本挖掘系统。
Database (Oxford). 2018 Jan 1;2018:bay128. doi: 10.1093/database/bay128.
7
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
8
Beyond accuracy: creating interoperable and scalable text-mining web services.超越准确性:创建可互操作且可扩展的文本挖掘网络服务。
Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.
9
Automated curation of gene name normalization results using the Konstanz information miner.使用康斯坦茨信息挖掘器对基因名称标准化结果进行自动管理。
J Biomed Inform. 2015 Feb;53:58-64. doi: 10.1016/j.jbi.2014.08.016. Epub 2014 Sep 10.
10
Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database.迈向半自动化策展:使用文本挖掘技术重现 HIV-1 与人类蛋白质相互作用数据库。
Database (Oxford). 2012 Apr 23;2012:bas023. doi: 10.1093/database/bas023. Print 2012.

引用本文的文献

1
PlantConnectome: A knowledge graph database encompassing >71,000 plant articles.植物连接组:一个包含超过71000篇植物相关文章的知识图谱数据库。
Plant Cell. 2025 Jul 1;37(7). doi: 10.1093/plcell/koaf169.
2
A novel approach for target deconvolution from phenotype-based screening using knowledge graph.一种使用知识图谱从基于表型的筛选中进行靶点反卷积的新方法。
Sci Rep. 2025 Jan 18;15(1):2414. doi: 10.1038/s41598-025-86166-w.
3
Crosstalk between MIR-96 and IRS/PI3K/AKT/VEGF cascade in hRPE cells; A potential target for preventing diabetic retinopathy.

本文引用的文献

1
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
2
University of Turku in the BioNLP'11 Shared Task.图尔库大学在 BioNLP'11 共享任务中的贡献。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-13-S11-S4.
3
Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.生物自然语言处理共享任务 2011 的 ID、EPI 和 REL 任务概述。
MIR-96 与 IRS/PI3K/AKT/VEGF 级联在人 RPE 细胞中的串扰;预防糖尿病视网膜病变的潜在靶点。
PLoS One. 2024 Sep 30;19(9):e0310999. doi: 10.1371/journal.pone.0310999. eCollection 2024.
4
Improving dictionary-based named entity recognition with deep learning.利用深度学习改进基于字典的命名实体识别。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii45-ii52. doi: 10.1093/bioinformatics/btae402.
5
Identifying vital nodes for yeast network by dynamic network entropy.通过动态网络熵识别酵母网络中的关键节点。
BMC Bioinformatics. 2024 Jul 18;25(1):242. doi: 10.1186/s12859-024-05863-x.
6
A large-scale evaluation of NLP-derived chemical-gene/protein relationships from the scientific literature: Implications for knowledge graph construction.从科学文献中大规模评估 NLP 衍生的化学-基因/蛋白质关系:对知识图谱构建的影响。
PLoS One. 2023 Sep 8;18(9):e0291142. doi: 10.1371/journal.pone.0291142. eCollection 2023.
7
GePI: large-scale text mining, customized retrieval and flexible filtering of gene/protein interactions.GePI:大规模文本挖掘、基因/蛋白质相互作用的定制检索和灵活过滤。
Nucleic Acids Res. 2023 Jul 5;51(W1):W237-W242. doi: 10.1093/nar/gkad445.
8
Automated assembly of molecular mechanisms at scale from text mining and curated databases.从文本挖掘和经过整理的数据库中大规模自动组装分子机制。
Mol Syst Biol. 2023 May 9;19(5):e11325. doi: 10.15252/msb.202211325. Epub 2023 Mar 20.
9
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。
NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.
10
Endothelial cells response to neutrophil-derived extracellular vesicles miRNAs in anti-PR3 positive vasculitis.中性粒细胞衍生细胞外囊泡 miRNAs 对抗 PR3 阳性血管炎内皮细胞的反应。
Clin Exp Immunol. 2021 May;204(2):267-282. doi: 10.1111/cei.13581. Epub 2021 Feb 28.
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-13-S11-S2.
4
The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011.2011 年生物自然语言处理共享任务的 Genia 事件和蛋白质共指任务。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-13-S11-S1.
5
Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations.使用EVEX探索生物分子文献:通过事件、同源性和间接关联连接基因。
Adv Bioinformatics. 2012;2012:582765. doi: 10.1155/2012/582765. Epub 2012 Jun 6.
6
BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events.BioContext:一个用于大规模提取和语境化生物分子事件的集成文本挖掘系统。
Bioinformatics. 2012 Aug 15;28(16):2154-61. doi: 10.1093/bioinformatics/bts332. Epub 2012 Jun 17.
7
SR4GN: a species recognition software tool for gene normalization.SR4GN:一种用于基因标准化的物种识别软件工具。
PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5.
8
Boosting automatic event extraction from the literature using domain adaptation and coreference resolution.利用领域自适应和共指解析技术提高文献中自动事件抽取的性能。
Bioinformatics. 2012 Jul 1;28(13):1759-65. doi: 10.1093/bioinformatics/bts237. Epub 2012 Apr 25.
9
Cross-species gene normalization by species inference.物种推断的跨物种基因标准化。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S5. doi: 10.1186/1471-2105-12-S8-S5.
10
The gene normalization task in BioCreative III.BioCreative III 中的基因标准化任务。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-12-S8-S2.