Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad511.
Sphagnum-dominated peatlands store a substantial amount of terrestrial carbon. The genus is undersampled and under-studied. No experimental crystal structure from any Sphagnum species exists in the Protein Data Bank and fewer than 200 Sphagnum-related genes have structural models available in the AlphaFold Protein Structure Database. Tools and resources are needed to help bridge these gaps, and to enable the analysis of other structural proteomes now made possible by accurate structure prediction.
We present the predicted structural proteome (25 134 primary transcripts) of Sphagnum divinum computed using AlphaFold, structural alignment results of all high-confidence models against an annotated nonredundant crystallographic database of over 90,000 structures, a structure-based classification of putative Enzyme Commission (EC) numbers across this proteome, and the computational method to perform this proteome-scale structure-based annotation.
All data and code are available in public repositories, detailed at https://github.com/BSDExabio/SAFA. The structural models of the S. divinum proteome have been deposited in the ModelArchive repository at https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv.
泥炭藓主导的泥炭地储存了大量的陆地碳。该属的样本较少,研究也较少。在蛋白质数据库中没有来自任何泥炭藓物种的实验晶体结构,在 AlphaFold 蛋白质结构数据库中可用的结构模型少于 200 个与泥炭藓相关的基因。需要工具和资源来帮助弥合这些差距,并能够分析现在通过准确结构预测成为可能的其他结构蛋白质组。
我们使用 AlphaFold 展示了计算得出的 Sphagnum divinum 的预测结构蛋白质组(25134 个初级转录本),所有高可信度模型与超过 90000 个结构的注释非冗余晶体学数据库的结构对齐结果,在这个蛋白质组中基于结构的假定酶委员会(EC)编号分类,以及执行这个蛋白质组规模的基于结构注释的计算方法。
所有数据和代码都可在公共存储库中获得,详细信息请访问 https://github.com/BSDExabio/SAFA。S. divinum 蛋白质组的结构模型已在 ModelArchive 存储库中存储,网址为 https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv。