Kleine Liliana Lopez, Monnet Véronique, Pechoux Christine, Trubuil Alain
HFSP J. 2008 Feb;2(1):29-41. doi: 10.2976/1.2820377. Epub 2008 Jan 7.
Despite the quantity of high-throughput data available nowadays, the precise role of many proteins has not been elucidated. Available methods for classifying proteins and reconstructing metabolic networks are efficient for finding global categories, but do not answer the biologist's specific and targeted questions. Following Yamanishi et al. [Yamanishi, Y, Vert, JP, Nakaya, A, and Kaneisha, M (2003). "Extraction of correlated clusters from multiple genomic data by generalized kernel canonical correlation analysis." Bioinformatics 19, Suppl. 1, i323-i330] we used a kernel canonical correlation analysis (KCCA) to predict the role of the bacterial peptidase PepF. We integrated five existing data types: protein metabolic networks, microarray data, phylogenetic profiles, distances between proteins and incomplete two-dimensional-gel data (for which we propose a completion strategy), available for Lactococcus lactis to determine relationships between proteins. The predicted relationships were then used to guide our laboratory work which proved most of the predictions correct. PepF had previously been characterized as a zinc dependent endopeptidase [Nardi, M, Renault, P, and Monnet, V (1997). "Duplication of the pepF gene and shuffling of DNA fragments on the lactose plasmid of Lactococcus lactis." J. Bacteriol. 179, 4164-4171; Monnet, V, Nardi, M, Chopin, MC, and Gripon, JC (1994). "Biochemical and genetic characterization of PepF on oligoendopeptidase from Lactococcus lactis." J. Bio. Chem. 269, 32070-32076]. Analyzing a PepF mutant, we confirmed its participation in protein secretion through a strong relationship between the signal peptidase I and PepF predicted by the KCCA. The global nature of our approach made it possible to discover pleiotropic roles of the protein which had remained unknown using classical approaches.
尽管如今有大量的高通量数据,但许多蛋白质的确切作用尚未阐明。现有的蛋白质分类和代谢网络重建方法在寻找全局类别方面很有效,但无法回答生物学家特定的针对性问题。按照Yamanishi等人[Yamanishi, Y, Vert, JP, Nakaya, A, and Kaneisha, M (2003). “通过广义核典型相关分析从多个基因组数据中提取相关簇。”《生物信息学》19,增刊1,i323 - i330]的方法,我们使用核典型相关分析(KCCA)来预测细菌肽酶PepF的作用。我们整合了五种现有数据类型:蛋白质代谢网络、微阵列数据、系统发育谱、蛋白质间距离以及不完整的二维凝胶数据(我们为此提出了一种补全策略),这些数据可用于乳酸乳球菌,以确定蛋白质之间的关系。然后,预测的关系被用于指导我们的实验室工作,结果证明大多数预测是正确的。PepF先前已被表征为一种锌依赖性内肽酶[Nardi, M, Renault, P, and Monnet, V (1997). “乳酸乳球菌乳糖质粒上pepF基因的复制和DNA片段的重排。”《细菌学杂志》179,4164 - 4171;Monnet, V, Nardi, M, Chopin, MC, and Gripon, JC (1994). “乳酸乳球菌寡肽酶PepF的生化和遗传特征。”《生物化学杂志》269,32070 - 32076]。通过分析一个PepF突变体,我们通过KCCA预测的信号肽酶I和PepF之间的紧密关系,证实了它参与蛋白质分泌。我们方法的全局性使得发现该蛋白质使用经典方法仍未知的多效性作用成为可能。