• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ECAmyloid:一种基于集成学习和综合序列衍生特征的淀粉样蛋白预测器。

ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features.

机构信息

School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.

School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.

出版信息

Comput Biol Chem. 2023 Jun;104:107853. doi: 10.1016/j.compbiolchem.2023.107853. Epub 2023 Mar 23.

DOI:10.1016/j.compbiolchem.2023.107853
PMID:36990028
Abstract

Amyloid fibrils formed by the mis-aggregation of amyloid proteins can lead to neuronal degenerations in the Alzheimer's disease. Predicting amyloid proteins not only contributes to understanding physicochemical properties and formation mechanism of amyloid proteins, but also has significant implications in the amyloid disease treatment and the development of a new purpose for amyloid materials. In this study, an ensemble learning model with sequence-derived features, ECAmyloid, is proposed to identify amyloids. The sequence-derived features including Pseudo Position Specificity Score Matrix (Pse-PSSM), Split Amino Acid Composition (SAAC), Solvent Accessibility (SA), and Secondary Structure Information (SSI) are employed to incorporate sequence composition, evolutionary and structural information. The individual learners of the ensemble learning model are selected by an increment classifier selection strategy. The final prediction results are determined by voting of prediction results of multiple individual learners. In view of the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted to generate positive samples. To eliminate irrelevant features and redundant features, correlation-based feature subset (CFS) selection combined with a heuristic search strategy is performed to obtain the optimal feature subset. Experimental results indicate that the ensemble classifier achieves an accuracy of 98.29%, a sensitivity of 0.992, a specificity of 0.974 on the training dataset using the 10-fold cross validation, far higher than the results obtained by its individual learners. Compared with the original feature set, the accuracy, sensitivity, specificity, MCC, F1-score, G-Mean of the ensemble method trained by the optimal feature subset are improved by 1.05%, 0.012, 0.01, 0.021, 0.011 and 0.011, respectively. Moreover, the comparison results with existing methods on two same independent test datasets demonstrate that the proposed method is an effective and promising predictor for large-scale determination of amyloid proteins. The data and code used to develop ECAmyloid has been shared to Github, and can be freely downloaded at https://github.com/KOALA-L/ECAmyloid.git.

摘要

淀粉样纤维由淀粉样蛋白的错误聚集形成,可导致阿尔茨海默病中的神经元变性。预测淀粉样蛋白不仅有助于理解淀粉样蛋白的物理化学性质和形成机制,而且对淀粉样疾病的治疗和淀粉样材料的新用途的开发具有重要意义。在这项研究中,提出了一种基于序列衍生特征的集成学习模型 ECAmyloid,用于识别淀粉样蛋白。所使用的序列衍生特征包括伪位置特异性得分矩阵(Pse-PSSM)、分裂氨基酸组成(SAAC)、溶剂可及性(SA)和二级结构信息(SSI),以结合序列组成、进化和结构信息。集成学习模型的各个学习者是通过增量分类器选择策略选择的。最终的预测结果由多个个体学习者的预测结果投票决定。针对不平衡的基准数据集,采用合成少数过采样技术(SMOTE)生成阳性样本。为了消除不相关特征和冗余特征,采用基于相关性的特征子集(CFS)选择与启发式搜索策略相结合的方法,以获得最优特征子集。实验结果表明,在使用 10 折交叉验证时,集成分类器在训练数据集上的准确率为 98.29%,灵敏度为 0.992,特异性为 0.974,远高于其各个学习者的结果。与原始特征集相比,使用最优特征子集训练的集成方法的准确率、灵敏度、特异性、MCC、F1 分数、G-均值分别提高了 1.05%、0.012、0.01、0.021、0.011 和 0.011。此外,在两个相同的独立测试数据集上与现有方法的比较结果表明,该方法是一种有效的、有前途的大规模淀粉样蛋白识别方法。用于开发 ECAmyloid 的数据和代码已共享到 Github,可以在 https://github.com/KOALA-L/ECAmyloid.git 上免费下载。

相似文献

1
ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features.ECAmyloid:一种基于集成学习和综合序列衍生特征的淀粉样蛋白预测器。
Comput Biol Chem. 2023 Jun;104:107853. doi: 10.1016/j.compbiolchem.2023.107853. Epub 2023 Mar 23.
2
DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.DeepStack-DTIs:使用 LightGBM 特征选择和深度堆叠集成分类器预测药物-靶标相互作用。
Interdiscip Sci. 2022 Jun;14(2):311-330. doi: 10.1007/s12539-021-00488-7. Epub 2021 Nov 3.
3
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy.基于序列的抗氧化蛋白预测:一种分类器选择策略
PLoS One. 2016 Sep 23;11(9):e0163274. doi: 10.1371/journal.pone.0163274. eCollection 2016.
4
Diagnosing Coronary Artery Disease on the Basis of Hard Ensemble Voting Optimization.基于硬集合投票优化的冠状动脉疾病诊断。
Medicina (Kaunas). 2022 Nov 28;58(12):1745. doi: 10.3390/medicina58121745.
5
RFAmyloid: A Web Server for Predicting Amyloid Proteins.RFAmyloid:用于预测淀粉样蛋白的网络服务器。
Int J Mol Sci. 2018 Jul 16;19(7):2071. doi: 10.3390/ijms19072071.
6
Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.基于位点特异性氨基酸组成和理化特性的蛋白质羰基化位点的研究与鉴定
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):66. doi: 10.1186/s12859-017-1472-8.
7
AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning.AMYPred-FRL 是一种通过使用特征表示学习来准确预测淀粉样蛋白的新方法。
Sci Rep. 2022 May 11;12(1):7697. doi: 10.1038/s41598-022-11897-z.
8
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测
PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.
9
PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy.PrUb-EL:一种基于深度学习的混合框架,使用集成学习策略识别拟南芥中的泛素化位点。
Anal Biochem. 2022 Dec 1;658:114935. doi: 10.1016/j.ab.2022.114935. Epub 2022 Oct 4.
10
Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection.使用增强型SMOTE和混沌进化特征选择的临床数据分类
Comput Biol Med. 2020 Nov;126:103991. doi: 10.1016/j.compbiomed.2020.103991. Epub 2020 Sep 18.

引用本文的文献

1
Predicting amyloid proteins using attention-based long short-term memory.使用基于注意力机制的长短期记忆网络预测淀粉样蛋白。
PeerJ Comput Sci. 2025 Feb 7;11:e2660. doi: 10.7717/peerj-cs.2660. eCollection 2025.