Poudel Suresh, Cope Alexander L, O'Dell Kaela B, Guss Adam M, Seo Hyeongmin, Trinh Cong T, Hettich Robert L
Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Biotechnol Biofuels. 2021 May 10;14(1):116. doi: 10.1186/s13068-021-01964-4.
Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research.
We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions.
This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.
基于质谱的蛋白质组学能够从单个微生物物种中鉴定和定量数千种蛋白质,但这些蛋白质中有很大一部分未被注释,因此被归类为功能未知的蛋白质(PUFs)。由于难以提取有意义的代谢信息,PUFs在数据分析过程中常常被忽视或舍弃,尽管它们在功能活动中可能至关重要,特别是对于代谢工程研究而言。
我们优化并采用了一个流程,该流程整合了各种“关联有罪”(GBA)指标,包括高通量质谱蛋白质组数据的差异表达和共表达分析、系统发育共进化分析以及基于序列同源性的方法,以确定嗜热栖热菌中PUFs的推定功能。我们的各种分析为在嗜热栖热菌的野生型和/或工程菌株中通过质谱检测到的超过95%的PUFs提供了推定功能信息。特别是,我们验证了一种预测的酰基转移酶PUF(WP_003519433.1)对2-苯乙醇具有功能活性,这与我们基于GBA和序列同源性的预测一致。
这项工作证明了利用基于序列同源性的注释以及基于GBA概念的经验证据来广泛预测PUFs推定功能的价值,为通过靶向实验进行进一步探究开辟了道路。