Charoenkwan Phasit, Nantasenamat Chanin, Hasan Md Mehedi, Manavalan Balachandran, Shoombuatong Watshara
Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.
Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
Bioinformatics. 2021 Sep 9;37(17):2556-2562. doi: 10.1093/bioinformatics/btab133.
The identification of bitter peptides through experimental approaches is an expensive and time-consuming endeavor. Due to the huge number of newly available peptide sequences in the post-genomic era, the development of automated computational models for the identification of novel bitter peptides is highly desirable.
In this work, we present BERT4Bitter, a bidirectional encoder representation from transformers (BERT)-based model for predicting bitter peptides directly from their amino acid sequence without using any structural information. To the best of our knowledge, this is the first time a BERT-based model has been employed to identify bitter peptides. Compared to widely used machine learning models, BERT4Bitter achieved the best performance with an accuracy of 0.861 and 0.922 for cross-validation and independent tests, respectively. Furthermore, extensive empirical benchmarking experiments on the independent dataset demonstrated that BERT4Bitter clearly outperformed the existing method with improvements of 8.0% accuracy and 16.0% Matthews coefficient correlation, highlighting the effectiveness and robustness of BERT4Bitter. We believe that the BERT4Bitter method proposed herein will be a useful tool for rapidly screening and identifying novel bitter peptides for drug development and nutritional research.
The user-friendly web server of the proposed BERT4Bitter is freely accessible at http://pmlab.pythonanywhere.com/BERT4Bitter.
Supplementary data are available at Bioinformatics online.
通过实验方法鉴定苦味肽是一项昂贵且耗时的工作。由于后基因组时代新出现的肽序列数量巨大,因此非常需要开发用于鉴定新型苦味肽的自动化计算模型。
在这项工作中,我们提出了BERT4Bitter,这是一种基于变换器的双向编码器表示(BERT)模型,可直接从氨基酸序列预测苦味肽,而无需使用任何结构信息。据我们所知,这是首次使用基于BERT的模型来鉴定苦味肽。与广泛使用的机器学习模型相比,BERT4Bitter在交叉验证和独立测试中的准确率分别达到0.861和0.922,表现最佳。此外,在独立数据集上进行的广泛实证基准实验表明,BERT4Bitter明显优于现有方法,准确率提高了8.0%,马修斯系数相关性提高了16.0%,突出了BERT4Bitter的有效性和稳健性。我们相信,本文提出的BERT4Bitter方法将成为快速筛选和鉴定用于药物开发和营养研究的新型苦味肽的有用工具。
所提出的BERT4Bitter的用户友好型网络服务器可在http://pmlab.pythonanywhere.com/BERT4Bitter上免费访问。
补充数据可在《生物信息学》在线获取。