Nguyen Nguyen Doan Hieu, Pham Nhat Truong, Tran Duong Thanh, Wei Leyi, Malik Adeel, Manavalan Balachandran
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, 16419, Republic of Korea.
Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macau, China.
J Cheminform. 2025 Aug 20;17(1):127. doi: 10.1186/s13321-025-01078-1.
Bitter peptides (BPs), derived from the hydrolysis of proteins in food, play a crucial role in both food science and biomedicine by influencing taste perception and participating in various physiological processes. Accurate identification of BPs is essential for understanding food quality and potential health impacts. Traditional machine learning approaches for BP identification have relied on conventional feature descriptors, achieving moderate success but struggling with the complexities of biological sequence data. Recent advances utilizing protein language model embedding and meta-learning approaches have improved the accuracy, but frequently neglect the molecular representations of peptides and lack interpretability. In this study, we propose xBitterT5, a novel multimodal and interpretable framework for BP identification that integrates pretrained transformer-based embeddings from BioT5+ with the combination of peptide sequence and its SELFIES molecular representation. Specifically, incorporating both peptide sequences and their molecular strings, xBitterT5 demonstrates superior performance compared to previous methods on the same benchmark datasets. Importantly, the model provides residue-level interpretability, highlighting chemically meaningful substructures that significantly contribute to its bitterness, thus offering mechanistic insights beyond black-box predictions. A user-friendly web server ( https://balalab-skku.org/xBitterT5/ ) and a standalone version ( https://github.com/cbbl-skku-org/xBitterT5/ ) are freely available to support both computational biologists and experimental researchers in peptide-based food and biomedicine.
苦味肽(BPs)来源于食物中蛋白质的水解,通过影响味觉感知和参与各种生理过程,在食品科学和生物医学中都发挥着关键作用。准确识别苦味肽对于理解食品质量和潜在的健康影响至关重要。传统的用于识别苦味肽的机器学习方法依赖于传统的特征描述符,虽取得了一定成功,但在处理生物序列数据的复杂性方面存在困难。最近利用蛋白质语言模型嵌入和元学习方法的进展提高了准确性,但经常忽略肽的分子表示且缺乏可解释性。在本研究中,我们提出了xBitterT5,这是一种用于苦味肽识别的新型多模态可解释框架,它将基于BioT5+预训练的基于Transformer的嵌入与肽序列及其SELFIES分子表示相结合。具体而言,xBitterT5结合了肽序列及其分子字符串,在相同的基准数据集上比以前的方法表现出更优的性能。重要的是,该模型提供了残基水平的可解释性,突出了对其苦味有显著贡献的化学上有意义的子结构,从而提供了超越黑箱预测的机理见解。一个用户友好的网络服务器(https://balalab-skku.org/xBitterT5/)和一个独立版本(https://github.com/cbbl-skku-org/xBitterT5/)可免费获取,以支持基于肽的食品和生物医学领域的计算生物学家和实验研究人员。