Day Erin C, Chittari Supraja S, Cunha Keila C, Zhao Roy J, Dodds James N, Davis Delaney C, Baker Erin S, Berlow Rebecca B, Shea Joan-Emma, Kulkarni Rishikesh U, Knight Abigail S
Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States.
Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, USA.
Chem. 2024 Nov 14;10(11):3444-3458. doi: 10.1016/j.chempr.2024.07.025. Epub 2024 Sep 6.
Understanding how a macromolecule's primary sequence governs its conformational landscape is crucial for elucidating its function, yet these design principles are still emerging for macromolecules with intrinsic disorder. Herein, we introduce a high-throughput workflow that implements a practical colorimetric conformational assay, introduces a semi-automated sequencing protocol using MALDI-MS/MS, and develops a generalizable sequence-structure algorithm. Using a model system of 20mer peptidomimetics containing polar glycine and hydrophobic -butylglycine residues, we identified nine classifications of conformational disorder and isolated 122 unique sequences across varied compositions and conformations. Conformational distributions of three compositionally identical library sequences were corroborated through atomistic simulations and ion mobility spectrometry coupled with liquid chromatography. A data-driven strategy was developed using existing sequence variables and data-derived 'motifs' to inform a machine learning algorithm towards conformation prediction. This multifaceted approach enhances our understanding of sequence-conformation relationships and offers a powerful tool for accelerating the discovery of materials with conformational control.
理解大分子的一级序列如何决定其构象景观对于阐明其功能至关重要,然而对于具有内在无序性的大分子而言,这些设计原则仍在不断涌现。在此,我们介绍了一种高通量工作流程,该流程实施了一种实用的比色构象测定法,引入了使用基质辅助激光解吸/电离串联质谱(MALDI-MS/MS)的半自动测序方案,并开发了一种可推广的序列-结构算法。使用包含极性甘氨酸和疏水性丁基甘氨酸残基的20聚体拟肽模型系统,我们确定了九种构象无序分类,并在不同组成和构象中分离出122个独特序列。通过原子模拟以及与液相色谱联用的离子淌度光谱法,证实了三个组成相同的文库序列的构象分布。利用现有的序列变量和数据衍生的“基序”,开发了一种数据驱动策略,以指导机器学习算法进行构象预测。这种多方面的方法增强了我们对序列-构象关系的理解,并为加速发现具有构象控制的材料提供了一个强大的工具。