Suppr超能文献

基于稀疏自动编码器特征提取和集成分类器的适体-蛋白质相互作用对的预测。

Prediction of aptamer-protein interacting pairs based on sparse autoencoder feature extraction and an ensemble classifier.

机构信息

Institute of Environmental Systems Biology, College of Environmental and Engineering, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.

School of Science, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.

出版信息

Math Biosci. 2019 May;311:103-108. doi: 10.1016/j.mbs.2019.01.009. Epub 2019 Mar 15.

Abstract

Aptamer-protein interacting pairs play important roles in physiological functions and structural characterization. Identifying aptamer-protein interacting pairs is challenging and limited, despite of the tremendous applications of aptamers. Therefore, it is vital to construct a high prediction performance model for identifying aptamer-target interacting pairs. In this study, a novel ensemble method is presented to predict aptamer-protein interacting pairs by integrating sequence characteristics derived from aptamers and the target proteins. The features extracted for aptamers were the compositions of amino acids and pseudo K-tuple nucleotides. In addition, a sparse autoencoder was used to characterize features for the target protein sequences. To remove redundant features, gradient boosting decision tree (GBDT) and incremental feature selection (IFS) methods were used to obtain the optimum combination of sequence characters. Based on 616 selected features, an ensemble of three sub- support vector machine (SVM) classifiers was used to construct our prediction model. Evaluated on an independent dataset, our predictor obtained an accuracy of 75.7%, Matthew's Correlation Coefficient of 0.478, and Youden's Index of 0.538, which were superior to the values reached using other existing predictors. The results show that our model can be used to distinguishing novel aptamer-protein interacting pairs and revealing the interrelation between aptamers and proteins.

摘要

适配体-蛋白质相互作用对在生理功能和结构特征中起着重要作用。尽管适配体有着巨大的应用潜力,但识别适配体-蛋白质相互作用对仍然具有挑战性和局限性。因此,构建一个具有高预测性能的模型来识别适配体-靶标相互作用对至关重要。在这项研究中,提出了一种新的集成方法,通过整合来自适配体和靶蛋白的序列特征来预测适配体-蛋白质相互作用对。从适配体中提取的特征是氨基酸组成和伪 K-核苷酸对。此外,还使用稀疏自编码器来描述靶蛋白序列的特征。为了去除冗余特征,使用梯度提升决策树 (GBDT) 和增量特征选择 (IFS) 方法来获得序列字符的最佳组合。基于 616 个选定的特征,使用三个子支持向量机 (SVM) 分类器的集成来构建我们的预测模型。在独立数据集上进行评估,我们的预测器的准确率为 75.7%,马修斯相关系数为 0.478,约登指数为 0.538,优于其他现有预测器的结果。结果表明,我们的模型可以用于区分新型适配体-蛋白质相互作用对,并揭示适配体与蛋白质之间的相互关系。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验