压缩学习及其在亚细胞定位中的应用。

Compressed learning and its applications to subcellular localization.

作者信息

Zheng Zhong-Long, Guo Li, Jia Jiong, Xie Chen-Mao, Zeng Wen-Cai, Yang Jie

机构信息

Department of Computer Science, Zhejiang Normal University, China.

出版信息

Protein Pept Lett. 2011 Sep;18(9):925-34. doi: 10.2174/092986611796011464.

DOI:10.2174/092986611796011464

PMID:21443498

Abstract

One of the main challenges faced by biological applications is to predict protein subcellular localization in automatic fashion accurately. To achieve this in these applications, a wide variety of machine learning methods have been proposed in recent years. Most of them focus on finding the optimal classification scheme and less of them take the simplifying the complexity of biological systems into account. Traditionally, such bio-data are analyzed by first performing a feature selection before classification. Motivated by CS (Compressed Sensing) theory, we propose the methodology which performs compressed learning with a sparseness criterion such that feature selection and dimension reduction are merged into one analysis. The proposed methodology decreases the complexity of biological system, while increases protein subcellular localization accuracy. Experimental results are quite encouraging, indicating that the aforementioned sparse methods are quite promising in dealing with complicated biological problems, such as predicting the subcellular localization of Gram-negative bacterial proteins.

摘要

生物应用面临的主要挑战之一是如何以自动方式准确预测蛋白质亚细胞定位。为了在这些应用中实现这一目标，近年来人们提出了各种各样的机器学习方法。其中大多数方法专注于寻找最优分类方案，而很少有方法考虑简化生物系统的复杂性。传统上，此类生物数据在分类之前首先要进行特征选择分析。受压缩感知（CS）理论的启发，我们提出了一种基于稀疏准则进行压缩学习的方法，将特征选择和降维合并为一个分析过程。该方法降低了生物系统的复杂性，同时提高了蛋白质亚细胞定位的准确性。实验结果相当令人鼓舞，表明上述稀疏方法在处理复杂的生物学问题（如预测革兰氏阴性细菌蛋白质的亚细胞定位）方面很有前景。