Suppr超能文献

通过异质特征和分层极限学习机预测细胞凋亡蛋白亚细胞定位。

Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine.

机构信息

a School of Mathematics and Statistics, Xidian University , Xi'an , PR China.

b School of Electronic Engineering, Xidian University , Xi'an , PR China.

出版信息

SAR QSAR Environ Res. 2019 Mar;30(3):209-228. doi: 10.1080/1062936X.2019.1576222. Epub 2019 Feb 26.

Abstract

Apoptosis is a fundamental process controlling normal tissue homeostasis by regulating a balance between cell proliferation and death. Predicting the subcellular location of apoptosis proteins is very helpful for understanding the mechanism of programmed cell death. Predicting protein subcellular localization with bioinformatic techniques provides quite a few opportunities in related fields. In this work, we propose the use of a hierarchical extreme learning machine (H-ELM) to make a classification of high-dimensional input data without demanding a dimension reduction process, which yields acceptable results. An attempt is made to extract features from different perspectives, and a feature fusion process is accomplished. Regarding the position-specific scoring matrix, the first type depicts the correlation within the sequence with the autocorrelation function for relatively random sections from the sequence; and the second type is the Kullback-Leibler (K-L) divergence of the two distributions formed by the amino acids' constitutuent proportions. It is illustrated in an experiment with features from different sources mixed by simple concatenation yielding a poor result, but the synthetical feature fused with stochastic nonlinear embedding (t-SNE) greatly improved the classification. Finally, the highest overall accuracy of ZD98 is 87.5% by adjusting the hyper-parameters of H-ELM, and of CL317 is 92.4%.

摘要

细胞凋亡是通过调控细胞增殖和死亡之间的平衡来控制正常组织稳态的基本过程。预测细胞凋亡蛋白的亚细胞定位对于理解程序性细胞死亡的机制非常有帮助。使用生物信息学技术预测蛋白质的亚细胞定位在相关领域提供了许多机会。在这项工作中,我们提出使用层次极端学习机(H-ELM)对高维输入数据进行分类,而不需要进行降维处理,这可以得到可以接受的结果。我们尝试从不同角度提取特征,并完成特征融合过程。关于位置特异性评分矩阵,第一种类型使用自相关函数来描述序列中相对随机部分之间的序列内相关性;第二种类型是由氨基酸组成比例形成的两个分布的 Kullback-Leibler(K-L)散度。通过简单地串联混合来自不同来源的特征进行实验说明了这一点,结果很差,但与随机非线性嵌入(t-SNE)融合的综合特征大大提高了分类效果。最后,通过调整 H-ELM 的超参数,ZD98 的整体准确率最高可达 87.5%,CL317 的准确率可达 92.4%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验