Tirunagari Santosh, Saha Shyamasree, Venkatesan Aravind, Suveges Daniel, Carmona Miguel, Buniello Annalisa, Ochoa David, McEntyre Johanna, McDonagh Ellen, Harrison Melissa
Literature Services Team, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom.
Open Targets, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom.
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf113.
The lit-OTAR framework, developed through a collaboration between Europe PMC and Open Targets, leverages deep learning to revolutionize drug discovery by extracting evidence from scientific literature for drug target identification and validation. This novel framework combines named entity recognition for identifying gene/protein (target), disease, organism, and chemical/drug within scientific texts, and entity normalization to map these entities to databases like Ensembl, Experimental Factor Ontology, and ChEMBL. Continuously operational, it has processed over 39 million abstracts and 4.5 million full-text articles and preprints to date, identifying more than 48.5 million unique associations that significantly help accelerate the drug discovery process and scientific research >29.9 m distinct target-disease, 11.8 m distinct target-drug, and 8.3 m distinct disease-drug relationships.
The results are accessible through Europe PMC's SciLite web app (https://europepmc.org/) and its annotations API (https://europepmc.org/annotationsapi), as well as via the Open Targets Platform (https://platform.opentargets.org/). The daily pipeline is available at https://github.com/ML4LitS/otar-maintenance, and the Open Targets ETL processes are available at https://github.com/opentargets.
lit-OTAR框架由欧洲分子生物学实验室核心(Europe PMC)与开放靶点(Open Targets)合作开发,它利用深度学习从科学文献中提取证据,用于药物靶点的识别和验证,从而彻底改变药物发现。这个新颖的框架结合了命名实体识别,用于识别科学文本中的基因/蛋白质(靶点)、疾病、生物体以及化学物质/药物,还包括实体标准化,以将这些实体映射到诸如Ensembl、实验因子本体和ChEMBL等数据库。该框架持续运行,迄今为止已处理了超过3900万篇摘要以及450万篇全文文章和预印本,识别出超过4850万个独特关联,显著有助于加速药物发现过程和科学研究,包括超过2990万个不同的靶点-疾病关系、1180万个不同的靶点-药物关系以及830万个不同的疾病-药物关系。
结果可通过欧洲分子生物学实验室核心的SciLite网络应用程序(https://europepmc.org/)及其注释应用程序编程接口(https://europepmc.org/annotationsapi)获取,也可通过开放靶点平台(https://platform.opentargets.org/)获取。每日流程可在https://github.com/ML4LitS/otar-maintenance上获取,开放靶点ETL流程可在https://github.com/opentargets上获取。