Pang Long, Wang Junjie, Zhao Lingling, Wang Chunyu, Zhan Hui
Harbin Nebula Bioinformatics Technology Development Co., Ltd., Harbin, China.
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
Front Genet. 2019 Jan 18;9:751. doi: 10.3389/fgene.2018.00751. eCollection 2018.
The disorder distribution of protein in the compartment or organelle leads to many human diseases, including neurodegenerative diseases such as Alzheimer's disease. The prediction of protein subcellular localization play important roles in the understanding of the mechanism of protein function, pathogenes and disease therapy. This paper proposes a novel subcellular localization method by integrating the Convolutional Neural Network (CNN) and eXtreme Gradient Boosting (XGBoost), where CNN acts as a feature extractor to automatically obtain features from the original sequence information and a XGBoost classifier as a recognizer to identify the protein subcellular localization based on the output of the CNN. Experiments are implemented on three protein datasets. The results prove that the CNN-XGBoost method performs better than the general protein subcellular localization methods.
蛋白质在区室或细胞器中的无序分布会导致许多人类疾病,包括阿尔茨海默病等神经退行性疾病。蛋白质亚细胞定位的预测在理解蛋白质功能机制、发病机制和疾病治疗方面发挥着重要作用。本文提出了一种将卷积神经网络(CNN)和极端梯度提升(XGBoost)相结合的新型亚细胞定位方法,其中CNN作为特征提取器,从原始序列信息中自动获取特征,而XGBoost分类器作为识别器,根据CNN的输出识别蛋白质亚细胞定位。在三个蛋白质数据集上进行了实验。结果证明,CNN-XGBoost方法比一般的蛋白质亚细胞定位方法表现更好。