Tiruveedula Gopi Siva Sai, Wangikar Pramod P
Department of Chemical Engineering, National Institute of Technology Karnataka, Surathkal, Mangalore, India.
Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India.
PLoS One. 2017 Jun 8;12(6):e0178565. doi: 10.1371/journal.pone.0178565. eCollection 2017.
Cyanobacteria, a group of photosynthetic prokaryotes, dominate the earth with ~ 1015 g wet biomass. Despite diversity in habitats and an ancient origin, cyanobacterial phylum has retained a significant core genome. Cyanobacteria are being explored for direct conversion of solar energy and carbon dioxide into biofuels. For this, efficient cyanobacterial strains will need to be designed via metabolic engineering. This will require identification of target knockouts to channelize the flow of carbon toward the product of interest while minimizing deletions of essential genes. We propose "Gene Conservation Index" (GCI) as a quick measure to predict gene essentiality in cyanobacteria. GCI is based on phylogenetic profile of a gene constructed with a reduced dataset of cyanobacterial genomes. GCI is the percentage of organism clusters in which the query gene is present in the reduced dataset. Of the 750 genes deemed to be essential in the experimental study on S. elongatus PCC 7942, we found 494 to be conserved across the phylum which largely comprise of the essential metabolic pathways. On the contrary, the conserved but non-essential genes broadly comprise of genes required under stress conditions. Exceptions to this rule include genes such as the glycogen synthesis and degradation enzymes, deoxyribose-phosphate aldolase (DERA), glucose-6-phosphate 1-dehydrogenase (zwf) and fructose-1,6-bisphosphatase class1, which are conserved but non-essential. While the essential genes are to be avoided during gene knockout studies as potentially lethal deletions, the non-essential but conserved set of genes could be interesting targets for metabolic engineering. Further, we identify clusters of co-evolving genes (CCG), which provide insights that may be useful in annotation. Principal component analysis (PCA) plots of the CCGs are demonstrated as data visualization tools that are complementary to the conventional heatmaps. Our dataset consists of phylogenetic profiles for 23,643 non-redundant cyanobacterial genes. We believe that the data and the analysis presented here will be a great resource to the scientific community interested in cyanobacteria.
蓝藻细菌是一群光合原核生物,其湿生物量约为10¹⁵克,在地球上占据主导地位。尽管栖息地多样且起源古老,但蓝藻细菌门保留了一个重要的核心基因组。人们正在探索利用蓝藻细菌将太阳能和二氧化碳直接转化为生物燃料。为此,需要通过代谢工程设计高效的蓝藻菌株。这将需要识别目标基因敲除,以引导碳流向目标产物,同时尽量减少必需基因的缺失。我们提出“基因保守指数”(GCI)作为预测蓝藻细菌基因必需性的一种快速方法。GCI基于用减少的蓝藻细菌基因组数据集构建的基因系统发育谱。GCI是查询基因在减少的数据集中存在的生物体簇的百分比。在对细长聚球藻PCC 7942的实验研究中被认为是必需的750个基因中,我们发现494个在整个门中是保守的,这些基因主要包括必需的代谢途径。相反,保守但非必需的基因广泛包括应激条件下所需的基因。该规则的例外包括糖原合成和降解酶、脱氧核糖磷酸醛缩酶(DERA)、葡萄糖-6-磷酸1-脱氢酶(zwf)和1类果糖-1,6-二磷酸酶等基因,它们是保守但非必需的。虽然在基因敲除研究中应避免必需基因,因为它们可能是致死性缺失,但非必需但保守的基因集可能是代谢工程的有趣目标。此外,我们识别了共同进化基因簇(CCG),这些基因簇提供的见解可能有助于注释。CCG的主成分分析(PCA)图被展示为与传统热图互补的数据可视化工具。我们的数据集由23643个非冗余蓝藻细菌基因的系统发育谱组成。我们相信,这里展示的数据和分析将为对蓝藻细菌感兴趣的科学界提供一个重要资源。