Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
J Theor Biol. 2011 Aug 21;283(1):44-52. doi: 10.1016/j.jtbi.2011.05.023. Epub 2011 May 26.
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems.
蛋白质-蛋白质相互作用 (PPIs) 在生物过程中起着重要作用。尽管已经投入了大量的努力通过整合实验生物学知识来识别新的 PPIs,但由于缺乏足够的蛋白质结构和功能信息,仍然存在许多困难。非常希望仅基于氨基酸序列开发用于预测 PPIs 的方法。然而,基于序列的预测器通常面临着高维性导致过拟合和高计算复杂度问题以及序列特征向量的冗余性。在本文中,提出了一种基于压缩感知理论的新计算方法,用于从原始序列预测酵母 Saccharomyces cerevisiae 的 PPIs,并取得了有希望的结果。所提出的压缩感知算法的主要优势在于,它可以考虑原始信号的稀疏性,将原始高维蛋白质序列特征向量压缩到低得多但更密集的空间中。在蛋白质序列分析中,压缩感知更具吸引力的原因是,它可以从比传统奈奎斯特采样理论通常认为的必要数量少得多的测量值中重建压缩信号。实验结果表明,所提出的压缩感知方法对于分析噪声生物数据和减少特征向量的冗余性非常有效。所提出的方法代表了一种处理高维蛋白质离散模型的新策略,并且具有很大的潜力扩展到处理许多其他复杂的生物系统。