Suppr超能文献

HemoDL:基于丰富序列衍生信息和Transformer增强信息的双集成引擎预测溶血肽

HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information.

作者信息

Yang Sen, Xu Piao

机构信息

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China.

College of Economics and Management, Nanjing Forestry University, China.

出版信息

Anal Biochem. 2024 Jul;690:115523. doi: 10.1016/j.ab.2024.115523. Epub 2024 Mar 28.

Abstract

Hemolytic peptides can trigger hemolysis by rupturing red blood cells' membranes and triggering cell disruption. Due to the labor-intensive and time-consuming in-lab identification process, accurate, high-throughput hemolytic peptide prediction is crucial for the growth of peptide sequence data in proteomics and peptidomics. In this study, we offer the HemoDL ensemble learning model, which learns the distinct distribution of sequence characteristics for predicting the hemolytic activity of peptides using a double LightGBM framework. To determine the most informative encoding features, we compare 17 widely used features across four benchmark datasets. Our investigation reveals that CTD, BPF, Charge, AAC, GDPC, ATC, QSO, and transformer-based features exhibit more positive contributions to detecting the hemolytic activity of peptides. Comparison with eight state-of-the-art methods demonstrates that HemoDL outperforms other models, attaining higher Matthews Correlation Coefficient values on four test datasets, ranging from 6.30% to 16.04%, 6.63%-11.26%, 4.76%-9.92%, and 7.41%-15.03%, respectively. Additionally, we provide the HemoDL with a user-friendly graphical interface available at https://github.com/abcair/HemoDL. In summary, the HemoDL model, leveraging CTD, BPF, Charge, AAC, GDPC, ATC, QSO and transformer-based encoding features within a double LightGBM learning framework, achieves high accuracy in predicting the hemolytic activity of peptides.

摘要

溶血肽可通过破坏红细胞膜并引发细胞破裂来触发溶血。由于实验室鉴定过程劳动强度大且耗时,准确、高通量的溶血肽预测对于蛋白质组学和肽组学中肽序列数据的增长至关重要。在本研究中,我们提供了HemoDL集成学习模型,该模型使用双LightGBM框架学习序列特征的不同分布,以预测肽的溶血活性。为了确定最具信息性的编码特征,我们在四个基准数据集上比较了17种广泛使用的特征。我们的研究表明,CTD、BPF、电荷、AAC、GDPC、ATC、QSO和基于Transformer的特征对检测肽的溶血活性表现出更积极的贡献。与八种先进方法的比较表明,HemoDL优于其他模型,在四个测试数据集上获得了更高的马修斯相关系数值,分别为6.30%至16.04%、6.63%-11.26%、4.76%-9.92%和7.41%-15.03%。此外,我们为HemoDL提供了一个用户友好的图形界面,可在https://github.com/abcair/HemoDL上获取。总之,HemoDL模型在双LightGBM学习框架内利用CTD、BPF、电荷、AAC、GDPC、ATC、QSO和基于Transformer的编码特征,在预测肽的溶血活性方面取得了高精度。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验