• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

pubmed2ensembl:一个挖掘基因相关生物文献的资源

pubmed2ensembl: a resource for mining the biological literature on genes.

机构信息

Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.

出版信息

PLoS One. 2011;6(9):e24716. doi: 10.1371/journal.pone.0024716. Epub 2011 Sep 29.

DOI:10.1371/journal.pone.0024716
PMID:21980353
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3183000/
Abstract

BACKGROUND

The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions.

METHODOLOGY/PRINCIPAL FINDINGS: To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data.

CONCLUSION/SIGNIFICANCE: By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature.

摘要

背景

在过去的二十年中,基因组序列信息的产生和生物医学文献的发表呈现出急剧加速的趋势。尽管基因组序列数据和出版物是许多生物学家最依赖的信息来源之一,但很少有人努力系统地将来自基因组序列的数据直接与生物文献整合。对于少数几个有专门团队的模式生物,他们会手动整理有关基因的出版物;然而,对于没有专门人员的物种,成千上万的文章从未被映射到基因或基因组区域。

方法/主要发现:为了克服基因组数据与生物文献之间缺乏整合的问题,我们开发了 pubmed2ensembl(http://www.pubmed2ensembl.org),这是 BioMart 系统的一个扩展,将 PubMed 中的超过 200 万篇文章与 Ensembl 中的近 15 万条基因链接起来,这些基因来自 50 个物种。我们使用了几种经过精心整理的基因-出版物链接来源(例如,Entrez Gene)和自动生成的来源(例如,通过对 MEDLINE 记录进行文本挖掘提取的基因名称),允许用户过滤和组合不同的数据源,以满足他们个人对信息提取和生物发现的需求。除了将 Ensembl BioMart 数据库扩展到包含有关基因的已发表信息外,我们还实现了一个用于自动化 BioMart 构建的脚本语言,以及一个新颖的 BioMart 接口,该接口允许针对 PubMed 和 PubMed Central 文档执行基于文本的查询,并结合对基因组特征的约束。最后,我们通过典型的用例来说明 pubmed2ensembl 的潜力,这些用例涉及跨生物医学文献和基因组数据的集成查询。

结论/意义:通过允许生物学家更容易地找到特定基因组区域或功能相关基因集的相关文献,pubmed2ensembl 提供了一种急需的基于基因组信息学的解决方案,以访问不断增加的生物医学文献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/3a0ab9d7c0bf/pone.0024716.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/b8f823338bf9/pone.0024716.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/9f93d26c3119/pone.0024716.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/3a0ab9d7c0bf/pone.0024716.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/b8f823338bf9/pone.0024716.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/9f93d26c3119/pone.0024716.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9edf/3183000/3a0ab9d7c0bf/pone.0024716.g003.jpg

相似文献

1
pubmed2ensembl: a resource for mining the biological literature on genes.pubmed2ensembl:一个挖掘基因相关生物文献的资源
PLoS One. 2011;6(9):e24716. doi: 10.1371/journal.pone.0024716. Epub 2011 Sep 29.
2
BioMart--biological queries made easy.生物集市——轻松进行生物学查询。
BMC Genomics. 2009 Jan 14;10:22. doi: 10.1186/1471-2164-10-22.
3
MILANO--custom annotation of microarray results using automatic literature searches.米兰——使用自动文献检索对微阵列结果进行定制注释。
BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12.
4
Textpresso: an ontology-based information retrieval and extraction system for biological literature.Textpresso:一个基于本体的生物文献信息检索与提取系统。
PLoS Biol. 2004 Nov;2(11):e309. doi: 10.1371/journal.pbio.0020309. Epub 2004 Sep 21.
5
Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens.通过自然语言处理对PubMed摘要进行文本挖掘,以创建关于细菌性肠道病原体分子机制的公共知识库。
BMC Bioinformatics. 2009 Jun 10;10:177. doi: 10.1186/1471-2105-10-177.
6
Using the Ensembl genome server to browse genomic sequence data.使用Ensembl基因组服务器浏览基因组序列数据。
Curr Protoc Bioinformatics. 2007 Jan;Chapter 1:Unit 1.15. doi: 10.1002/0471250953.bi0115s16.
7
Statistical Viewer: a tool to upload and integrate linkage and association data as plots displayed within the Ensembl genome browser.统计查看器:一种用于上传和整合连锁与关联数据并将其作为图谱显示在Ensembl基因组浏览器中的工具。
BMC Bioinformatics. 2005 Apr 12;6:95. doi: 10.1186/1471-2105-6-95.
8
GeneNotes--a novel information management software for biologists.基因笔记——一款面向生物学家的新型信息管理软件。
BMC Bioinformatics. 2005 Feb 1;6:20. doi: 10.1186/1471-2105-6-20.
9
Searching the Mouse Genome Informatics (MGI) resources for information on mouse biology from genotype to phenotype.在小鼠基因组信息学(MGI)资源中搜索有关小鼠生物学从基因型到表型的信息。
Curr Protoc Bioinformatics. 2004 May;Chapter 1:Unit 1.7. doi: 10.1002/0471250953.bi0107s05.
10
Atlas - a data warehouse for integrative bioinformatics.阿特拉斯——一个用于整合生物信息学的数据仓库。
BMC Bioinformatics. 2005 Feb 21;6:34. doi: 10.1186/1471-2105-6-34.

引用本文的文献

1
Deep Learning-Based Drug Compounds Discovery for Gynecomastia.基于深度学习的男性乳腺增生症药物化合物发现
Biomedicines. 2025 Jan 21;13(2):262. doi: 10.3390/biomedicines13020262.
2
Publication, funding, and experimental data in support of Human Reference Atlas construction and usage.支持人类参考图谱构建和使用的出版、资助和实验数据。
Sci Data. 2024 Jun 4;11(1):574. doi: 10.1038/s41597-024-03416-8.
3
Investigation of anti-depression effects and potential mechanisms of the ethyl acetate extract of Rupr. through the integration of experiments, LC-MS/MS chemical analysis, and a systems biology approach.

本文引用的文献

1
Annotating genes and genomes with DNA sequences extracted from biomedical articles.从生物医学文章中提取的 DNA 序列注释基因和基因组。
Bioinformatics. 2011 Apr 1;27(7):980-6. doi: 10.1093/bioinformatics/btr043. Epub 2011 Feb 16.
2
GeneTUKit: a software for document-level gene normalization.Genetukit:一种用于文档级基因标准化的软件。
Bioinformatics. 2011 Apr 1;27(7):1032-3. doi: 10.1093/bioinformatics/btr042. Epub 2011 Feb 8.
3
PubMed and beyond: a survey of web tools for searching biomedical literature.PubMed 及其他:生物医学文献检索网络工具调查。
通过整合实验、液相色谱-串联质谱化学分析和系统生物学方法,研究 Rupr. 乙酸乙酯提取物的抗抑郁作用及其潜在机制。
Front Pharmacol. 2023 Oct 25;14:1239197. doi: 10.3389/fphar.2023.1239197. eCollection 2023.
4
Molecular Mechanisms of Resistance to Ionizing Radiation in and Its Relationship with Aging, Oxidative Stress, and Antioxidant Activity.电离辐射抗性的分子机制及其与衰老、氧化应激和抗氧化活性的关系
Antioxidants (Basel). 2023 Aug 30;12(9):1690. doi: 10.3390/antiox12091690.
5
Potential alternative drug treatment for bone giant cell tumor.骨巨细胞瘤的潜在替代药物治疗方法。
Front Cell Dev Biol. 2023 Jun 13;11:1193217. doi: 10.3389/fcell.2023.1193217. eCollection 2023.
6
Drug Discovery in Canine Pyometra Disease Identified by Text Mining and Microarray Data Analysis.犬子宫蓄脓症的药物发现:基于文本挖掘和基因芯片数据分析。
Biomed Res Int. 2023 Apr 17;2023:7839568. doi: 10.1155/2023/7839568. eCollection 2023.
7
A Four-Gene Signature Associated with Radioresistance in Head and Neck Squamous Cell Carcinoma Identified by Text Mining and Data Analysis.通过文本挖掘和数据分析鉴定与头颈部鳞状细胞癌放射抵抗相关的四基因标志物。
Comput Math Methods Med. 2022 Sep 27;2022:5693806. doi: 10.1155/2022/5693806. eCollection 2022.
8
Text Mining-Based Drug Discovery for Connective Tissue Disease-Associated Pulmonary Arterial Hypertension.基于文本挖掘的结缔组织病相关性肺动脉高压药物发现
Front Pharmacol. 2022 Mar 18;13:743210. doi: 10.3389/fphar.2022.743210. eCollection 2022.
9
A Computational Text Mining-Guided Meta-Analysis Approach to Identify Potential Xerostomia Drug Targets.一种基于计算文本挖掘的元分析方法来识别潜在的口干症药物靶点。
J Clin Med. 2022 Mar 5;11(5):1442. doi: 10.3390/jcm11051442.
10
Association between chronic periodontitis and the risk of Alzheimer's disease: combination of text mining and GEO dataset.慢性牙周炎与阿尔茨海默病风险的相关性:文本挖掘与 GEO 数据集的联合研究。
BMC Oral Health. 2021 Sep 23;21(1):466. doi: 10.1186/s12903-021-01827-2.
Database (Oxford). 2011 Jan 18;2011:baq036. doi: 10.1093/database/baq036. Print 2011.
4
Entrez Gene: gene-centered information at NCBI.Entrez基因:美国国立医学图书馆国家生物技术信息中心的基因中心信息。
Nucleic Acids Res. 2011 Jan;39(Database issue):D52-7. doi: 10.1093/nar/gkq1237. Epub 2010 Nov 28.
5
ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments.ArrayExpress更新——一个基于微阵列和高通量测序的功能基因组学实验存档库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D1002-4. doi: 10.1093/nar/gkq1040. Epub 2010 Nov 10.
6
Ongoing and future developments at the Universal Protein Resource.通用蛋白质资源的当前及未来发展情况。
Nucleic Acids Res. 2011 Jan;39(Database issue):D214-9. doi: 10.1093/nar/gkq1020. Epub 2010 Nov 4.
7
Ensembl 2011.Ensembl 2011年版
Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2.
8
The European Nucleotide Archive.欧洲核苷酸数据库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D28-31. doi: 10.1093/nar/gkq967. Epub 2010 Oct 23.
9
REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila.REDfly v3.0:迈向果蝇转录调控元件综合数据库
Nucleic Acids Res. 2011 Jan;39(Database issue):D118-23. doi: 10.1093/nar/gkq999. Epub 2010 Oct 21.
10
The UCSC Genome Browser database: update 2011.加州大学圣克鲁兹分校基因组浏览器数据库:2011年更新
Nucleic Acids Res. 2011 Jan;39(Database issue):D876-82. doi: 10.1093/nar/gkq963. Epub 2010 Oct 18.