Suppr
超能文献

Boost-RS：用于推荐系统的增强嵌入及其在酶-底物相互作用预测中的应用。

Boost-RS: boosted embeddings for recommender systems and its application to enzyme-substrate interaction prediction.

机构信息

Department of Computer Science, Tufts University, Medford, MA 02155, USA.

Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA.

出版信息

Bioinformatics. 2022 May 13;38(10):2832-2838. doi: 10.1093/bioinformatics/btac201.

DOI:10.1093/bioinformatics/btac201

PMID:35561204

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9113267/

Abstract

MOTIVATION

Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme-substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge.

RESULTS

We propose an innovative general RS framework, termed Boost-RS that enhances RS performance by 'boosting' embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme-substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors.

AVAILABILITY AND IMPLEMENTATION

A Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).

摘要

动机

尽管进行了实验和策展工作，但酶在底物上的杂化程度在很大程度上仍未得到探索和记录。提供用于探索酶-底物相互作用空间的计算工具可以加快实验速度，并有益于构建新型生物分子的合成途径、确定摄入化合物的代谢产物以及阐明外源性代谢等应用。推荐系统（RS）目前在酶-底物相互作用预测问题上尚未得到探索，可用于为底物推荐酶，反之亦然。然而，协同过滤（CF）RS 的性能取决于用户和项目（在我们的案例中为酶和底物）嵌入向量的质量。重要的是，使用异构辅助数据（特别是关系数据（例如层次结构、成对或分组））增强 CF 嵌入仍然是一个挑战。

结果

我们提出了一种创新的通用 RS 框架，称为 Boost-RS，通过辅助数据“增强”嵌入向量来提高 RS 的性能。具体来说，Boost-RS 是在多个相关辅助学习任务上进行训练和动态调整的Boost-RS 利用对比学习任务来利用关系数据。为了展示 Boost-RS 在酶-底物预测相互作用问题中的有效性，我们将 Boost-RS 框架应用于几个基线 CF 模型。我们表明，我们的每个辅助任务都增强了嵌入向量的学习，并且使用 Boost-RS 进行的对比学习优于属性连接和多标签学习。我们还表明，Boost-RS 优于基于相似性的模型。消融研究和学习表示的可视化突出了在增强嵌入向量时使用对比学习辅助数据的重要性。

可用性和实现

提供了一个用于 Boost-RS 的 Python 实现，网址为 https://github.com/HassounLab/Boost-RS。酶-底物相互作用数据可从 KEGG 数据库（https://www.genome.jp/kegg/）获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d660/9113267/6640b0cdd2c0/btac201f1.jpg

相似文献

Boost-RS: boosted embeddings for recommender systems and its application to enzyme-substrate interaction prediction.

Bioinformatics. 2022 May 13;38(10):2832-2838. doi: 10.1093/bioinformatics/btac201.

MLGL-MP: a Multi-Label Graph Learning framework enhanced by pathway interdependence for Metabolic Pathway prediction.

Bioinformatics. 2022 Jun 24;38(Suppl 1):i325-i332. doi: 10.1093/bioinformatics/btac222.

Enzyme Promiscuity Prediction Using Hierarchy-Informed Multi-Label Classification.

Bioinformatics. 2021 Aug 4;37(14):2017–2024. doi: 10.1093/bioinformatics/btab054. Epub 2021 Jan 30.

CSI: Contrastive data Stratification for Interaction prediction and its application to compound-protein interaction prediction.

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad456.

Using graph neural networks for site-of-metabolism prediction and its applications to ranking promiscuous enzymatic products.

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad089.

Effective metric learning with co-occurrence embedding for collaborative recommendations.

Neural Netw. 2020 Apr;124:308-318. doi: 10.1016/j.neunet.2020.01.021. Epub 2020 Jan 30.

CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure.

Bioinformatics. 2022 Sep 15;38(18):4380-4386. doi: 10.1093/bioinformatics/btac520.

Learning graph representations of biochemical networks and its application to enzymatic link prediction.

Bioinformatics. 2021 May 5;37(6):793-799. doi: 10.1093/bioinformatics/btaa881.

Graph convolutional network and self-attentive for sequential recommendation.

PeerJ Comput Sci. 2023 Dec 1;9:e1701. doi: 10.7717/peerj-cs.1701. eCollection 2023.

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

本文引用的文献

Multitask feature learning approach for knowledge graph enhanced recommendations with RippleNet.

PLoS One. 2021 May 14;16(5):e0251162. doi: 10.1371/journal.pone.0251162. eCollection 2021.

Analysis of metabolic network disruption in engineered microbial hosts due to enzyme promiscuity.

Metab Eng Commun. 2021 Mar 7;12:e00170. doi: 10.1016/j.mec.2021.e00170. eCollection 2021 Jun.

Enzyme Promiscuity Prediction Using Hierarchy-Informed Multi-Label Classification.

Bioinformatics. 2021 Aug 4;37(14):2017–2024. doi: 10.1093/bioinformatics/btab054. Epub 2021 Jan 30.

Learning graph representations of biochemical networks and its application to enzymatic link prediction.

Bioinformatics. 2021 May 5;37(6):793-799. doi: 10.1093/bioinformatics/btaa881.

A graph regularized generalized matrix factorization model for predicting links in biomedical bipartite networks.

Bioinformatics. 2020 Jun 1;36(11):3474-3481. doi: 10.1093/bioinformatics/btaa157.

Synthetic Biochemistry: The Bio-inspired Cell-Free Approach to Commodity Chemical Production.

Trends Biotechnol. 2020 Jul;38(7):766-778. doi: 10.1016/j.tibtech.2019.12.024. Epub 2020 Jan 23.

Machine learning approaches and databases for prediction of drug-target interaction: a survey paper.

Brief Bioinform. 2021 Jan 18;22(1):247-269. doi: 10.1093/bib/bbz157.

Computational methods and tools to predict cytochrome P450 metabolism for drug discovery.

Chem Biol Drug Des. 2019 Apr;93(4):377-386. doi: 10.1111/cbdd.13445. Epub 2019 Jan 15.

Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing.

PLoS Comput Biol. 2016 Oct 7;12(10):e1005135. doi: 10.1371/journal.pcbi.1005135. eCollection 2016 Oct.

Semisupervised Gaussian Process for Automated Enzyme Search.

ACS Synth Biol. 2016 Jun 17;5(6):518-28. doi: 10.1021/acssynbio.5b00294. Epub 2016 Mar 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

Boost-RS：用于推荐系统的增强嵌入及其在酶-底物相互作用预测中的应用。

Boost-RS: boosted embeddings for recommender systems and its application to enzyme-substrate interaction prediction.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译