Department of Biochemistry, University of Washington, Seattle, WA, USA.
Institute for Protein Design, University of Washington, Seattle, WA, USA.
Nature. 2019 Jun;570(7761):390-394. doi: 10.1038/s41586-019-1274-4. Epub 2019 Jun 5.
Online citizen science projects such as GalaxyZoo, Eyewire and Phylo have proven very successful for data collection, annotation and processing, but for the most part have harnessed human pattern-recognition skills rather than human creativity. An exception is the game EteRNA, in which game players learn to build new RNA structures by exploring the discrete two-dimensional space of Watson-Crick base pairing possibilities. Building new proteins, however, is a more challenging task to present in a game, as both the representation and evaluation of a protein structure are intrinsically three-dimensional. We posed the challenge of de novo protein design in the online protein-folding game Foldit. Players were presented with a fully extended peptide chain and challenged to craft a folded protein structure and an amino acid sequence encoding that structure. After many iterations of player design, analysis of the top-scoring solutions and subsequent game improvement, Foldit players can now-starting from an extended polypeptide chain-generate a diversity of protein structures and sequences that encode them in silico. One hundred forty-six Foldit player designs with sequences unrelated to naturally occurring proteins were encoded in synthetic genes; 56 were found to be expressed and soluble in Escherichia coli, and to adopt stable monomeric folded structures in solution. The diversity of these structures is unprecedented in de novo protein design, representing 20 different folds-including a new fold not observed in natural proteins. High-resolution structures were determined for four of the designs, and are nearly identical to the player models. This work makes explicit the considerable implicit knowledge that contributes to success in de novo protein design, and shows that citizen scientists can discover creative new solutions to outstanding scientific challenges such as the protein design problem.
在线公民科学项目,如 GalaxyZoo、Eyewire 和 Phylo,已被证明在数据收集、注释和处理方面非常成功,但在很大程度上利用了人类的模式识别技能,而不是人类的创造力。一个例外是游戏 EteRNA,在这个游戏中,游戏玩家通过探索沃森-克里克碱基配对可能性的离散二维空间来学习构建新的 RNA 结构。然而,在游戏中呈现构建新蛋白质的任务更具挑战性,因为蛋白质结构的表示和评估本质上是三维的。我们在在线蛋白质折叠游戏 Foldit 中提出了从头设计蛋白质的挑战。玩家会看到一个完全伸展的肽链,并被要求设计一个折叠的蛋白质结构和一个编码该结构的氨基酸序列。经过多次玩家设计的迭代、对得分最高的解决方案的分析以及随后的游戏改进,现在 Foldit 玩家可以从伸展的多肽链开始,在计算机中生成各种蛋白质结构及其编码序列。146 个与天然蛋白质无关的 Foldit 玩家设计序列被编码在合成基因中;其中 56 个被发现可在大肠杆菌中表达和溶解,并在溶液中采用稳定的单体折叠结构。这些结构的多样性在从头设计蛋白质中是前所未有的,代表了 20 种不同的折叠结构,包括在天然蛋白质中未观察到的新折叠。其中四个设计的高分辨率结构已经确定,并且几乎与玩家模型完全一致。这项工作明确了在从头设计蛋白质中有助于成功的大量隐含知识,并表明公民科学家可以发现创造性的新解决方案,以应对突出的科学挑战,如蛋白质设计问题。