Department of Chemistry, Connecticut College, New London, CT, United States of America.
Department of Mathematics and Statistics, Connecticut College, New London, CT, United States of America.
PLoS One. 2022 Jun 16;17(6):e0267560. doi: 10.1371/journal.pone.0267560. eCollection 2022.
AlphaFold2 and RoseTTAfold are able to predict, based solely on their sequence whether GFP-like proteins will post-translationally form a chromophore (the part of the protein responsible for fluorescence) or not. Their training has not only taught them protein structure and folding, but also chemistry. The structures of 21 sequences of GFP-like fluorescent proteins that will post-translationally form a chromophore and of 23 GFP-like non-fluorescent proteins that do not have the residues required to form a chromophore were determined by AlphaFold2 and RoseTTAfold. The resultant structures were mined for a series of geometric measurements that are crucial to chromophore formation. Statistical analysis of these measurements showed that both programs conclusively distinguished between chromophore forming and non-chromophore forming proteins. A clear distinction between sequences capable of forming a chromophore and those that do not have the residues required for chromophore formation can be obtained by examining a single measurement-the RMSD of the overlap of the central alpha helices of the crystal structure of S65T GFP and the AlphaFold2 determined structure. Only 10 of the 578 GFP-like proteins in the pdb have no chromophore, yet when AlphaFold2 and RoseTTAFold are presented with the sequences of 44 GFP-like proteins that are not in the pdb they fold the proteins in such a way that one can unequivocally distinguish between those that can and cannot form a chromophore.
AlphaFold2 和 RoseTTAfold 能够仅根据其序列预测 GFP 样蛋白是否会在翻译后形成生色团(负责荧光的蛋白质部分)。它们的训练不仅教会了它们蛋白质结构和折叠,还教会了它们化学知识。AlphaFold2 和 RoseTTAfold 确定了 21 种 GFP 样荧光蛋白序列和 23 种 GFP 样非荧光蛋白序列的结构,这些序列在翻译后会形成生色团,而这些非荧光蛋白序列没有形成生色团所需的残基。从这些结构中挖掘出了一系列对生色团形成至关重要的几何测量值。对这些测量值的统计分析表明,这两个程序都能明确地区分生色团形成蛋白和非生色团形成蛋白。通过检查单个测量值——S65T GFP 晶体结构的中央α螺旋与 AlphaFold2 确定结构的重叠 RMSD,可以清楚地区分能够形成生色团的序列和不具有形成生色团所需残基的序列。pdb 中只有 578 种 GFP 样蛋白中没有生色团,但当 AlphaFold2 和 RoseTTAFold 提供pdb 之外的 44 种 GFP 样蛋白的序列时,它们会以一种方式折叠这些蛋白,使得人们可以明确地区分生色团能否形成。