药物靶点精简：利用基因本体论和基因本体注释在ChEMBL中探索蛋白质-配体靶点空间

A drug target slim: using gene ontology and gene ontology annotations to navigate protein-ligand target space in ChEMBL.

作者信息

Mutowo Prudence, Bento A Patrícia, Dedman Nathan, Gaulton Anna, Hersey Anne, Lomax Jane, Overington John P

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

出版信息

J Biomed Semantics. 2016 Sep 27;7(1):59. doi: 10.1186/s13326-016-0102-0.

DOI:10.1186/s13326-016-0102-0

PMID:27678076

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5039825/

Abstract

BACKGROUND

The process of discovering new drugs is a lengthy, time-consuming and expensive process. Modern day drug discovery relies heavily on the rapid identification of novel 'targets', usually proteins that can be modulated by small molecule drugs to cure or minimise the effects of a disease. Of the 20,000 proteins currently reported as comprising the human proteome, just under a quarter of these can potentially be modulated by known small molecules Storing information in curated, actively maintained drug discovery databases can help researchers access current drug discovery information quickly. However with the increase in the amount of data generated from both experimental and in silico efforts, databases can become very large very quickly and information retrieval from them can become a challenge. The development of database tools that facilitate rapid information retrieval is important to keep up with the growth of databases.

DESCRIPTION

We have developed a Gene Ontology-based navigation tool (Gene Ontology Tree) to help users retrieve biological information to single protein targets in the ChEMBL drug discovery database. 99 % of single protein targets in ChEMBL have at least one GO annotation associated with them. There are 12,500 GO terms associated to 6200 protein targets in the ChEMBL database resulting in a total of 140,000 annotations. The slim we have created, the 'ChEMBL protein target slim' allows broad categorisation of the biology of 90 % of the protein targets using just 300 high level, informative GO terms. We used the GO slim method of assigning fewer higher level GO groupings to numerous very specific lower level terms derived from the GOA to describe a set of GO terms relevant to proteins in ChEMBL. We then used the slim created to provide a web based tool that allows a quick and easy navigation of protein target space. Terms from the GO are used to capture information on protein molecular function, biological process and subcellular localisations. The ChEMBL database also provides compound information for small molecules that have been tested for their effects on these protein targets. The 'ChEMBL protein target slim' provides a means of firstly describing the biology of protein drug targets and secondly allows users to easily establish a connection between biological and chemical information regarding drugs and drug targets in ChEMBL. The 'ChEMBL protein target slim' is available as a browsable 'Gene Ontology Tree' on the ChEMBL site under the browse targets tab ( https://www.ebi.ac.uk/chembl/target/browser ). A ChEMBL protein target slim OBO file containing the GO slim terms pertinent to ChEMBL is available from the GOC website ( http://geneontology.org/page/go-slim-and-subset-guide ).

CONCLUSIONS

We have created a protein target navigation tool based on the 'ChEMBL protein target slim'. The 'ChEMBL protein target slim' provides a way of browsing protein targets in ChEMBL using high level GO terms that describe the molecular functions, processes and subcellular localisations of protein drug targets in drug discovery. The tool also allows user to establish a link between ontological groupings representing protein target biology to relevant compound information in ChEMBL. We have demonstrated by the use of a simple example how the 'ChEMBL protein target slim' can be used to link biological processes with drug information based on the information in the ChEMBL database. The tool has potential to aid in areas of drug discovery such as drug repurposing studies or drug-disease-protein pathways.

摘要

背景

发现新药的过程漫长、耗时且成本高昂。现代药物发现严重依赖于新型“靶点”的快速识别，这些靶点通常是可被小分子药物调节以治愈疾病或减轻疾病影响的蛋白质。在目前报道的构成人类蛋白质组的20000种蛋白质中，只有不到四分之一的蛋白质可能被已知小分子调节。在经过精心策划、积极维护的药物发现数据库中存储信息，有助于研究人员快速获取当前的药物发现信息。然而，随着实验和计算机模拟所产生的数据量不断增加，数据库可能会迅速变得非常庞大，从其中检索信息可能会成为一项挑战。开发便于快速信息检索的数据库工具对于跟上数据库的增长至关重要。

描述

我们开发了一种基于基因本体论的导航工具（基因本体论树），以帮助用户在ChEMBL药物发现数据库中检索针对单个蛋白质靶点的生物学信息。ChEMBL中99%的单个蛋白质靶点至少有一个与之相关的基因本体注释。ChEMBL数据库中有12500个基因本体术语与6200个蛋白质靶点相关，总共产生了140000条注释。我们创建的精简版，即“ChEMBL蛋白质靶点精简版”，仅使用300个高级、信息丰富的基因本体术语，就能对90%的蛋白质靶点的生物学特性进行广泛分类。我们采用基因本体精简方法，为从基因本体注释（GOA）派生的众多非常具体的低级别术语分配较少的高级基因本体分组，以描述一组与ChEMBL中的蛋白质相关的基因本体术语。然后，我们利用创建的精简版提供一个基于网络的工具，该工具允许快速轻松地浏览蛋白质靶点空间。基因本体的术语用于获取有关蛋白质分子功能、生物学过程和亚细胞定位的信息。ChEMBL数据库还提供了已测试其对这些蛋白质靶点作用的小分子的化合物信息。“ChEMBL蛋白质靶点精简版”首先提供了一种描述蛋白质药物靶点生物学特性的方法，其次允许用户轻松建立ChEMBL中关于药物和药物靶点的生物学信息与化学信息之间的联系。“ChEMBL蛋白质靶点精简版”可在ChEMBL网站的浏览靶点标签下（https://www.ebi.ac.uk/chembl/target/browser）作为可浏览的“基因本体论树”获取。一个包含与ChEMBL相关的基因本体精简术语的ChEMBL蛋白质靶点精简版OBO文件可从基因本体联盟网站（http://geneontology.org/page/go-slim-and-subset-guide）获取。

结论

我们基于“ChEMBL蛋白质靶点精简版”创建了一个蛋白质靶点导航工具。“ChEMBL蛋白质靶点精简版”提供了一种使用高级基因本体术语浏览ChEMBL中蛋白质靶点的方法，这些术语描述了药物发现中蛋白质药物靶点的分子功能、过程和亚细胞定位。该工具还允许用户在代表蛋白质靶点生物学的本体分组与ChEMBL中相关的化合物信息之间建立联系。我们通过一个简单的例子展示了如何基于ChEMBL数据库中的信息，使用“ChEMBL蛋白质靶点精简版”将生物学过程与药物信息联系起来。该工具在药物发现领域，如药物再利用研究或药物 - 疾病 - 蛋白质途径方面具有潜在的辅助作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfb3/5039825/65681691d899/13326_2016_102_Fig1_HTML.jpg

相似文献

A drug target slim: using gene ontology and gene ontology annotations to navigate protein-ligand target space in ChEMBL.药物靶点精简：利用基因本体论和基因本体注释在ChEMBL中探索蛋白质-配体靶点空间

J Biomed Semantics. 2016 Sep 27;7(1):59. doi: 10.1186/s13326-016-0102-0.

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.基因本体注释（GOA）数据库：在UniProt中与基因本体共享知识。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.

GeneTools--application for functional annotation and statistical hypothesis testing.基因工具——用于功能注释和统计假设检验的应用程序。

BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.

DynGO: a tool for visualizing and mining of Gene Ontology and its associations.DynGO：一种用于可视化和挖掘基因本体及其关联的工具。

BMC Bioinformatics. 2005 Aug 9;6:201. doi: 10.1186/1471-2105-6-201.

OntologyWidget - a reusable, embeddable widget for easily locating ontology terms.本体小部件 - 一种可重复使用、可嵌入的小部件，用于轻松定位本体术语。

BMC Bioinformatics. 2007 Sep 13;8:338. doi: 10.1186/1471-2105-8-338.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意（BioCreAtIvE）和基因本体注释（GOA）的基因本体（GO）注释检索的评估。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

Drug target ontology to classify and integrate drug discovery data.用于对药物发现数据进行分类和整合的药物靶点本体论。

J Biomed Semantics. 2017 Nov 9;8(1):50. doi: 10.1186/s13326-017-0161-x.

The ChEMBL database in 2017.2017年的ChEMBL数据库。

Nucleic Acids Res. 2017 Jan 4;45(D1):D945-D954. doi: 10.1093/nar/gkw1074. Epub 2016 Nov 28.

The GOA database in 2009--an integrated Gene Ontology Annotation resource.2009年的基因本体注释（GOA）数据库——一个整合的基因本体注释资源。

Nucleic Acids Res. 2009 Jan;37(Database issue):D396-403. doi: 10.1093/nar/gkn803. Epub 2008 Oct 27.

A new ChEMBL dataset for the similarity-based target fishing engine FastTargetPred: Annotation of an exhaustive list of linear tetrapeptides.用于基于相似性的靶点筛选引擎FastTargetPred的新ChEMBL数据集：线性四肽详尽列表的注释

Data Brief. 2022 Apr 11;42:108159. doi: 10.1016/j.dib.2022.108159. eCollection 2022 Jun.

引用本文的文献

Investigating potential drug targets for IgA nephropathy and membranous nephropathy through multi-queue plasma protein analysis: a Mendelian randomization study based on SMR and co-localization analysis.通过多队列血浆蛋白分析研究IgA肾病和膜性肾病的潜在药物靶点：一项基于SMR和共定位分析的孟德尔随机化研究

BioData Min. 2024 Nov 8;17(1):49. doi: 10.1186/s13040-024-00405-w.

ECBD: European chemical biology database.ECBD：欧洲化学生物学数据库。

Nucleic Acids Res. 2025 Jan 6;53(D1):D1383-D1392. doi: 10.1093/nar/gkae904.

GEFormerDTA: drug target affinity prediction based on transformer graph for early fusion.基于 Transformer 图的早期融合的药物靶点亲和力预测的 GEFormerDTA

Sci Rep. 2024 Mar 28;14(1):7416. doi: 10.1038/s41598-024-57879-1.

COVID-19 and Traumatic Brain Injury (TBI); What We Can Learn From the Viral Pandemic to Better Understand the Biology of TBI, Improve Diagnostics and Develop Evidence-Based Treatments.新型冠状病毒肺炎与创伤性脑损伤（TBI）；我们能从这场病毒大流行中学到什么，以更好地理解TBI的生物学机制、改进诊断方法并开发循证治疗方案。

Front Neurol. 2021 Dec 20;12:752937. doi: 10.3389/fneur.2021.752937. eCollection 2021.

Deep Modeling of Regulating Effects of Small Molecules on Longevity-Associated Genes.小分子对长寿相关基因调控作用的深度建模

Pharmaceuticals (Basel). 2021 Sep 22;14(10):948. doi: 10.3390/ph14100948.

D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.D-SCRIPT 通过基于序列、结构感知的基因组规模的蛋白质-蛋白质相互作用预测，将基因组转化为表型。

Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9.

Integrated analysis of competing endogenous RNA in esophageal carcinoma.食管癌中竞争性内源性RNA的综合分析

J Gastrointest Oncol. 2021 Feb;12(1):11-27. doi: 10.21037/jgo-20-615.

FRnet-DTI: Deep convolutional neural network for drug-target interaction prediction.FRnet-DTI：用于药物-靶点相互作用预测的深度卷积神经网络。

Heliyon. 2020 Mar 2;6(3):e03444. doi: 10.1016/j.heliyon.2020.e03444. eCollection 2020 Mar.

Comparison Study of Computational Prediction Tools for Drug-Target Binding Affinities.药物-靶点结合亲和力计算预测工具的比较研究

Front Chem. 2019 Nov 20;7:782. doi: 10.3389/fchem.2019.00782. eCollection 2019.

Informatics and Computational Methods in Natural Product Drug Discovery: A Review and Perspectives.天然产物药物发现中的信息学与计算方法：综述与展望

Front Genet. 2019 Apr 30;10:368. doi: 10.3389/fgene.2019.00368. eCollection 2019.

本文引用的文献

Properties of protein drug target classes.蛋白质药物靶点类别的特性。

PLoS One. 2015 Mar 30;10(3):e0117955. doi: 10.1371/journal.pone.0117955. eCollection 2015.

jvenn: an interactive Venn diagram viewer.jvenn：一个交互式的韦恩图查看器。

BMC Bioinformatics. 2014 Aug 29;15(1):293. doi: 10.1186/1471-2105-15-293.

The ChEMBL bioactivity database: an update.《ChEMBL 生物活性数据库更新》

Nucleic Acids Res. 2014 Jan;42(Database issue):D1083-90. doi: 10.1093/nar/gkt1031. Epub 2013 Nov 7.

The druggable genome: Evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication.可成药性基因组：临床试验中药物靶点的评估表明分子类别和适应症有重大转变。

Annu Rev Pharmacol Toxicol. 2014;54:9-26. doi: 10.1146/annurev-pharmtox-011613-135943. Epub 2013 Aug 30.

Use of Gene Ontology Annotation to understand the peroxisome proteome in humans.利用基因本体论注释理解人类过氧化物酶体蛋白质组。

Database (Oxford). 2013 Jan 17;2013:bas062. doi: 10.1093/database/bas062. Print 2013.

Gene Ontology annotations and resources.基因本体论注释和资源。

Nucleic Acids Res. 2013 Jan;41(Database issue):D530-5. doi: 10.1093/nar/gks1050. Epub 2012 Nov 17.

QuickGO: a user tutorial for the web-based Gene Ontology browser.QuickGO：基于网络的基因本体浏览器用户指南

Database (Oxford). 2009;2009:bap010. doi: 10.1093/database/bap010. Epub 2009 Sep 29.

Druggability of human disease genes.人类疾病基因的成药可能性。

Int J Biochem Cell Biol. 2007;39(6):1156-64. doi: 10.1016/j.biocel.2007.02.018. Epub 2007 Mar 7.

How many drug targets are there?有多少种药物靶点？

Nat Rev Drug Discov. 2006 Dec;5(12):993-6. doi: 10.1038/nrd2199.

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.基因本体注释（GOA）数据库：在UniProt中与基因本体共享知识。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

药物靶点精简：利用基因本体论和基因本体注释在ChEMBL中探索蛋白质-配体靶点空间

A drug target slim: using gene ontology and gene ontology annotations to navigate protein-ligand target space in ChEMBL.

作者信息

机构信息

出版信息

BACKGROUND

DESCRIPTION

CONCLUSIONS

背景

描述

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献