Patra Sourav Kumar, Randolph Nicholas, Kuhlman Brian, Dieckhaus Henry, Betts Laurie, Douglas Jordan, Wills Peter R, Carter Charles W
Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA.
Department of Physics, University of Auckland, Auckland, New Zealand.
Struct Dyn. 2025 Apr 25;12(2):024701. doi: 10.1063/4.0000294. eCollection 2025 Mar.
Protein design plays a key role in our efforts to work out how genetic coding began. That effort entails urzymes. Urzymes are small, conserved excerpts from full-length aminoacyl-tRNA synthetases that remain active. Urzymes require design to connect disjoint pieces and repair naked nonpolar patches created by removing large domains. Rosetta allowed us to create the first urzymes, but those urzymes were only sparingly soluble. We could measure activity, but it was hard to concentrate those samples to levels required for structural biology. Here, we used the deep learning algorithms ProteinMPNN and AlphaFold2 to redesign a set of optimized LeuAC urzymes derived from leucyl-tRNA synthetase. We select a balanced, representative subset of eight variants for testing using principal component analysis. Most tested variants are much more soluble than the original LeuAC. They also span a range of catalytic proficiency and amino acid specificity. The data enable detailed statistical analyses of the sources of both solubility and specificity. In that way, we show how to begin to unwrap the elements of protein chemistry that were hidden within the neural networks. Deep learning networks have thus helped us surmount several vexing obstacles to further investigations into the nature of ancestral proteins. Finally, we discuss how the eight variants might resemble a sample drawn from a population similar to one subject to natural selection.
蛋白质设计在我们弄清楚遗传编码如何起源的过程中起着关键作用。这项工作需要核酶。核酶是全长氨酰 - tRNA合成酶中保守的小片段,仍具有活性。核酶需要设计来连接不连续的片段,并修复因去除大结构域而产生的裸露非极性区域。罗塞塔软件使我们能够创造出首批核酶,但那些核酶的溶解性很差。我们能够测量其活性,但很难将这些样品浓缩到结构生物学所需的水平。在这里,我们使用深度学习算法ProteinMPNN和AlphaFold2重新设计了一组源自亮氨酰 - tRNA合成酶的优化LeuAC核酶。我们使用主成分分析选择了一个由八个变体组成的平衡且具有代表性的子集进行测试。大多数测试变体的溶解性比原始LeuAC核酶好得多。它们还涵盖了一系列催化能力和氨基酸特异性。这些数据能够对溶解性和特异性的来源进行详细的统计分析。通过这种方式,我们展示了如何开始揭示隐藏在神经网络中的蛋白质化学元素。深度学习网络因此帮助我们克服了进一步研究祖先蛋白质性质的几个棘手障碍。最后,我们讨论了这八个变体如何可能类似于从一个类似于经历自然选择的群体中抽取的样本。