• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

疾病 2.0:从文本挖掘和数据集成中获取的每周更新的疾病-基因关联数据库。

Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.

机构信息

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark.

Department of Internal Medicine, Division of Translational Informatics, University of New Mexico Health Sciences Center, Albuquerque, NM, USA.

出版信息

Database (Oxford). 2022 Mar 28;2022. doi: 10.1093/database/baac019.

DOI:10.1093/database/baac019
PMID:35348648
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9216524/
Abstract

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.

摘要

有关哪些基因与哪些疾病有关的科学知识迅速增长,这使得人们很难跟上新的出版物和遗传学数据集。DISEASES 数据库旨在通过系统地整合和为来自经过精心整理的数据库、全基因组关联研究 (GWAS) 和生物医学文献的自动文本挖掘的疾病-基因关联提供置信度评分,从而提供全面的概述。在这里,我们对该资源进行了重大更新,大大增加了所有这些来源的关联数量。对于从文本挖掘中获得的关联尤其如此,所有置信度截止值的关联数量至少增加了 9 倍。我们表明,这种急剧增加主要是由于将全文文章添加到文本语料库中,其次是由于用于命名实体识别的疾病和基因词典的改进,并且仅在很小程度上是由于 PubMed 摘要数量的增加。DISEASES 现在还利用了一个新的 GWAS 数据库,即通过 GWAS 分析进行靶向照明,这大大增加了 GWAS 衍生的疾病-基因关联的数量。DISEASES 本身也集成到其他几个数据库和资源中,包括 GeneCards/MalaCards、Pharos/Target Central Resource Database 和 Cytoscape stringApp。DISEASES 中的所有数据每周都会更新,并可通过 https://diseases.jensenlab.org 上的网络界面访问,也可以根据开放许可证从该界面下载。数据库 URL:https://diseases.jensenlab.org。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09cb/9216524/b04e5df57153/baac019f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09cb/9216524/56d4dabd7df8/baac019f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09cb/9216524/b04e5df57153/baac019f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09cb/9216524/56d4dabd7df8/baac019f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09cb/9216524/b04e5df57153/baac019f2.jpg

相似文献

1
Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.疾病 2.0:从文本挖掘和数据集成中获取的每周更新的疾病-基因关联数据库。
Database (Oxford). 2022 Mar 28;2022. doi: 10.1093/database/baac019.
2
DISEASES: text mining and data integration of disease-gene associations.疾病:疾病-基因关联的文本挖掘与数据整合
Methods. 2015 Mar;74:83-9. doi: 10.1016/j.ymeth.2014.11.020. Epub 2014 Dec 5.
3
Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.从文本和大规模数据分析中提取基因与疾病之间的关系:对转化研究的启示。
BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.
4
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.用于快速准确识别文本中分类名称的物种和生物体资源。
PLoS One. 2013 Jun 18;8(6):e65390. doi: 10.1371/journal.pone.0065390. Print 2013.
5
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD:一种用于检测微小RNA与疾病关联的文本挖掘工具。
J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.
6
FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.FamPlex:生物医学文本挖掘中人类蛋白质家族和复合物的实体识别和关系解析资源。
BMC Bioinformatics. 2018 Jun 28;19(1):248. doi: 10.1186/s12859-018-2211-5.
7
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。
BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.
8
Linking common human diseases to their phenotypes; development of a resource for human phenomics.将常见人类疾病与其表型相联系;开发人类表型组学资源。
J Biomed Semantics. 2021 Aug 23;12(1):17. doi: 10.1186/s13326-021-00249-x.
9
IBDDB: a manually curated and text-mining-enhanced database of genes involved in inflammatory bowel disease.IBDDB:一个手动整理和文本挖掘增强的炎症性肠病相关基因数据库。
Database (Oxford). 2021 Apr 30;2021. doi: 10.1093/database/baab022.
10
Interactome of the hepatitis C virus: Literature mining with ANDSystem.丙型肝炎病毒的相互作用组:使用 ANDSystem 进行文献挖掘。
Virus Res. 2016 Jun 15;218:40-8. doi: 10.1016/j.virusres.2015.12.003. Epub 2015 Dec 7.

引用本文的文献

1
Quantifying compatibility mechanisms in traditional Chinese medicine with interpretable graph neural networks.用可解释图神经网络量化中药中的配伍机制。
J Pharm Anal. 2025 Aug;15(8):101342. doi: 10.1016/j.jpha.2025.101342. Epub 2025 May 12.
2
Enhanced genetic fine mapping accuracy with Bayesian Linear Regression models in diverse genetic architectures.在不同遗传结构中使用贝叶斯线性回归模型提高遗传精细定位的准确性。
PLoS Genet. 2025 Jul 30;21(7):e1011783. doi: 10.1371/journal.pgen.1011783. eCollection 2025 Jul.
3
SenSet, a novel human lung senescence cell gene signature, identifies cell-specific senescence mechanisms.

本文引用的文献

1
A Novel Metric to Quantify the Effect of Pathway Enrichment Evaluation With Respect to Biomedical Text-Mined Terms: Development and Feasibility Study.一种用于量化关于生物医学文本挖掘术语的通路富集评估效果的新指标:开发与可行性研究。
JMIR Med Inform. 2021 Jun 18;9(6):e28247. doi: 10.2196/28247.
2
TIGA: target illumination GWAS analytics.TIGA:目标照明全基因组关联分析。
Bioinformatics. 2021 Nov 5;37(21):3865-3873. doi: 10.1093/bioinformatics/btab427.
3
The fight against fake-paper factories that churn out sham science.打击制造虚假科学的造假工厂的斗争。
SenSet是一种新型的人类肺衰老细胞基因特征,可识别细胞特异性衰老机制。
bioRxiv. 2024 Dec 22:2024.12.21.629928. doi: 10.1101/2024.12.21.629928.
4
A bioinformatic analysis to systematically unveil shared pathways and molecular mechanisms underlying monkeypox and its predominant neurological manifestations.一项生物信息学分析,旨在系统揭示猴痘及其主要神经学表现背后的共同途径和分子机制。
Front Cell Infect Microbiol. 2025 Jul 2;15:1506687. doi: 10.3389/fcimb.2025.1506687. eCollection 2025.
5
Recovering time-varying networks from single-cell data.从单细胞数据中恢复时变网络。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i628-i636. doi: 10.1093/bioinformatics/btaf210.
6
GeneHarmony: A Knowledge-Based Tool for Biomarker Discovery in Disease: Sjögren's Disease vs. Rheumatoid Arthritis and Systemic Lupus Erythematosus.基因和谐:一种基于知识的疾病生物标志物发现工具:干燥综合征与类风湿关节炎和系统性红斑狼疮的比较
Int J Mol Sci. 2025 Jul 2;26(13):6379. doi: 10.3390/ijms26136379.
7
Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations.达林(v2.0):挖掘疾病相关数据库以检测生物医学实体关联。
Comput Struct Biotechnol J. 2025 Jun 14;27:2626-2637. doi: 10.1016/j.csbj.2025.06.025. eCollection 2025.
8
De Novo Missense Variant in Bovine WDR33 Associated With a Complex Syndromic Form of Cleft Palate With Pentalogy of Fallot and Internal Hydrocephalus.牛WDR33中的新生错义变异与伴有法洛四联症和内部脑积水的复杂综合征型腭裂相关。
J Vet Intern Med. 2025 Jul-Aug;39(4):e70144. doi: 10.1111/jvim.70144.
9
Targeting protein disorder: the next hurdle in drug discovery.靶向蛋白质无序状态:药物研发的下一个障碍。
Nat Rev Drug Discov. 2025 Jun 9. doi: 10.1038/s41573-025-01220-6.
10
EndoMAP.v1 charts the structural landscape of human early endosome complexes.EndoMAP.v1描绘了人类早期内体复合体的结构全貌。
Nature. 2025 May 28. doi: 10.1038/s41586-025-09059-y.
Nature. 2021 Mar;591(7851):516-519. doi: 10.1038/d41586-021-00733-5.
4
A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature.一种从文献中检索药物基因组学关联的新型文本挖掘方法。
Front Pharmacol. 2020 Nov 10;11:602030. doi: 10.3389/fphar.2020.602030. eCollection 2020.
5
Open Targets Platform: supporting systematic drug-target identification and prioritisation.Open Targets 平台:支持系统性药物靶点识别和优先级排序。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1302-D1310. doi: 10.1093/nar/gkaa1027.
6
TCRD and Pharos 2021: mining the human proteome for disease biology.TCRD 和 Pharos 2021:从人类蛋白质组中挖掘疾病生物学。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1334-D1346. doi: 10.1093/nar/gkaa993.
7
A compendium of mutational cancer driver genes.癌症驱动基因突变综合分析
Nat Rev Cancer. 2020 Oct;20(10):555-572. doi: 10.1038/s41568-020-0290-x. Epub 2020 Aug 10.
8
The DisGeNET knowledge platform for disease genomics: 2019 update.DisGeNET 疾病基因组学知识平台:2019 年更新。
Nucleic Acids Res. 2020 Jan 8;48(D1):D845-D855. doi: 10.1093/nar/gkz1021.
9
GWAS Central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide association studies.GWAS 中心:一个全面的资源,用于发现和比较来自全基因组关联研究的基因型和表型数据。
Nucleic Acids Res. 2020 Jan 8;48(D1):D933-D940. doi: 10.1093/nar/gkz895.
10
Geneshot: search engine for ranking genes from arbitrary text queries.Geneshot:从任意文本查询中对基因进行排名的搜索引擎。
Nucleic Acids Res. 2019 Jul 2;47(W1):W571-W577. doi: 10.1093/nar/gkz393.