Knoshaug Eric P, Sun Peipei, Nag Ambarish, Nguyen Huong, Mattoon Erin M, Zhang Ningning, Liu Jian, Chen Chen, Cheng Jianlin, Zhang Ru, St John Peter, Umen James
Biosciences Center National Renewable Energy Laboratory Golden Colorado USA.
Donald Danforth Plant Science Center St. Louis MO USA.
Plant Direct. 2023 Dec 1;7(12):e527. doi: 10.1002/pld3.527. eCollection 2023 Dec.
The rapid accumulation of sequenced plant genomes in the past decade has outpaced the still difficult problem of genome-wide protein-coding gene annotation. A substantial fraction of protein-coding genes in all plant genomes are poorly annotated or unannotated and remain functionally uncharacterized. We identified unannotated proteins in three model organisms representing distinct branches of the green lineage (Viridiplantae): (eudicot), (monocot), and (Chlorophyte alga). Using similarity searching, we identified a subset of unannotated proteins that were conserved between these species and defined them as Deep Green proteins. Bioinformatic, genomic, and structural predictions were performed to begin classifying Deep Green genes and proteins. Compared to whole proteomes for each species, the Deep Green set was enriched for proteins with predicted chloroplast targeting signals predictive of photosynthetic or plastid functions, a result that was consistent with enrichment for daylight phase diurnal expression patterning. Structural predictions using AlphaFold and comparisons to known structures showed that a significant proportion of Deep Green proteins may possess novel folds. Though only available for three organisms, the Deep Green genes and proteins provide a starting resource of high-value targets for further investigation of potentially new protein structures and functions conserved across the green lineage.
在过去十年中,已测序植物基因组的快速积累超过了全基因组蛋白质编码基因注释这一仍然棘手的问题。所有植物基因组中相当一部分蛋白质编码基因注释不佳或未注释,其功能仍未得到表征。我们在代表绿色谱系(绿藻植物门)不同分支的三种模式生物中鉴定了未注释的蛋白质:拟南芥(双子叶植物)、水稻(单子叶植物)和莱茵衣藻(绿藻)。通过相似性搜索,我们鉴定了这些物种之间保守的未注释蛋白质子集,并将它们定义为深绿蛋白。进行了生物信息学、基因组学和结构预测,以开始对深绿基因和蛋白质进行分类。与每个物种的整个蛋白质组相比,深绿蛋白集中富含具有预测叶绿体靶向信号的蛋白质,这些信号预示着光合或质体功能,这一结果与白天阶段昼夜表达模式的富集一致。使用AlphaFold进行的结构预测以及与已知结构的比较表明,相当一部分深绿蛋白可能具有新的折叠结构。尽管仅对三种生物可用,但深绿基因和蛋白质为进一步研究绿色谱系中保守的潜在新蛋白质结构和功能提供了高价值目标的起始资源。