Research Group Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Freiburg 79104, Germany.
Bioinformatics. 2022 Sep 15;38(18):4452-4453. doi: 10.1093/bioinformatics/btac539.
Newly discovered functional relationships of (bio-)molecules are a key component in molecular biology and life science research. Especially in the drug discovery field, knowledge of how small molecules associated with proteins plays a fundamental role in understanding how drugs or metabolites can affect cells, tissues and human metabolism. Finding relevant information about these relationships among the huge number of published articles is becoming increasingly challenging and time-consuming. On average, more than 25 000 new (bio-)medical articles are added to the literature database PubMed weekly. In this article, we present a new web server [compound-protein relationships in literature (CPRiL)] that provides information on functional relationships between small molecules and proteins in literature. Currently, CPRiL contains ∼465 000 unique names and synonyms of small molecules, ∼100 000 unique proteins and more than 9 million described functional relationships between these entities. The applied BioBERT machine learning model for the determination of functional relationships between small molecules and proteins in texts was extensively trained and tested. On a related benchmark, CPRiL yielded a high performance, with an F1 score of 84.3%, precision of 82.9% and recall of 85.7%.
CPRiL is freely available at https://www.pharmbioinf.uni-freiburg.de/cpril.
Supplementary data are available at Bioinformatics online.
新发现的(生物)分子功能关系是分子生物学和生命科学研究的关键组成部分。特别是在药物发现领域,了解与蛋白质相关的小分子如何在理解药物或代谢物如何影响细胞、组织和人体代谢方面起着基础性作用。在数量庞大的已发表文章中找到这些关系的相关信息变得越来越具有挑战性和耗时。平均每周有超过 25000 篇新的(生物)医学文章被添加到文献数据库 PubMed 中。在本文中,我们介绍了一个新的网络服务器[文献中的化合物-蛋白质关系(CPRiL)],该服务器提供了文献中小分子和蛋白质之间功能关系的信息。目前,CPRiL 包含约 465000 个小分子的独特名称和同义词,约 100000 个独特蛋白质以及超过 900 万种这些实体之间描述的功能关系。用于确定文本中小分子和蛋白质之间功能关系的 BioBERT 机器学习模型经过了广泛的训练和测试。在相关基准测试中,CPRiL 的性能很高,F1 得分为 84.3%,精度为 82.9%,召回率为 85.7%。
CPRiL 可在 https://www.pharmbioinf.uni-freiburg.de/cpril 免费获取。
补充数据可在 Bioinformatics 在线获取。