Department of Computer Science, Tufts University, Medford, MA 02155, USA.
Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA.
Bioinformatics. 2022 May 13;38(10):2832-2838. doi: 10.1093/bioinformatics/btac201.
Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme-substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge.
We propose an innovative general RS framework, termed Boost-RS that enhances RS performance by 'boosting' embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme-substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors.
A Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).
尽管进行了实验和策展工作,但酶在底物上的杂化程度在很大程度上仍未得到探索和记录。提供用于探索酶-底物相互作用空间的计算工具可以加快实验速度,并有益于构建新型生物分子的合成途径、确定摄入化合物的代谢产物以及阐明外源性代谢等应用。推荐系统(RS)目前在酶-底物相互作用预测问题上尚未得到探索,可用于为底物推荐酶,反之亦然。然而,协同过滤(CF)RS 的性能取决于用户和项目(在我们的案例中为酶和底物)嵌入向量的质量。重要的是,使用异构辅助数据(特别是关系数据(例如层次结构、成对或分组))增强 CF 嵌入仍然是一个挑战。
我们提出了一种创新的通用 RS 框架,称为 Boost-RS,通过辅助数据“增强”嵌入向量来提高 RS 的性能。具体来说,Boost-RS 是在多个相关辅助学习任务上进行训练和动态调整的Boost-RS 利用对比学习任务来利用关系数据。为了展示 Boost-RS 在酶-底物预测相互作用问题中的有效性,我们将 Boost-RS 框架应用于几个基线 CF 模型。我们表明,我们的每个辅助任务都增强了嵌入向量的学习,并且使用 Boost-RS 进行的对比学习优于属性连接和多标签学习。我们还表明,Boost-RS 优于基于相似性的模型。消融研究和学习表示的可视化突出了在增强嵌入向量时使用对比学习辅助数据的重要性。
提供了一个用于 Boost-RS 的 Python 实现,网址为 https://github.com/HassounLab/Boost-RS。酶-底物相互作用数据可从 KEGG 数据库(https://www.genome.jp/kegg/)获得。