Laboratory for Atomistic and Molecular Mechanics, Massachusetts Institute of Technology, Cambridge, MA 02139.
Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139.
Proc Natl Acad Sci U S A. 2022 Oct 4;119(40):e2209524119. doi: 10.1073/pnas.2209524119. Epub 2022 Sep 26.
Collagen is the most abundant structural protein in humans, providing crucial mechanical properties, including high strength and toughness, in tissues. Collagen-based biomaterials are, therefore, used for tissue repair and regeneration. Utilizing collagen effectively during materials processing ex vivo and subsequent function in vivo requires stability over wide temperature ranges to avoid denaturation and loss of structure, measured as melting temperature (T). Although significant research has been conducted on understanding how collagen primary amino acid sequences correspond to T values, a robust framework to facilitate the design of collagen sequences with specific T remains a challenge. Here, we develop a general model using a genetic algorithm within a deep learning framework to design collagen sequences with specific T values. We report 1,000 de novo collagen sequences, and we show that we can efficiently use this model to generate collagen sequences and verify their T values using both experimental and computational methods. We find that the model accurately predicts T values within a few degrees centigrade. Further, using this model, we conduct a high-throughput study to identify the most frequently occurring collagen triplets that can be directly incorporated into collagen. We further discovered that the number of hydrogen bonds within collagen calculated with molecular dynamics (MD) is directly correlated to the experimental measurement of triple-helical quality. Ultimately, we see this work as a critical step to helping researchers develop collagen sequences with specific T values for intended materials manufacturing methods and biomedical applications, realizing a mechanistic materials by design paradigm.
胶原蛋白是人体中最丰富的结构蛋白,为组织提供了关键的机械性能,包括高强度和韧性。因此,基于胶原蛋白的生物材料被用于组织修复和再生。为了在体外材料处理过程中有效地利用胶原蛋白,并在体内随后发挥作用,需要在很宽的温度范围内保持稳定性,以避免变性和结构丧失,这可以通过熔点(T)来衡量。尽管已经进行了大量研究来了解胶原蛋白的一级氨基酸序列如何对应 T 值,但设计具有特定 T 值的胶原蛋白序列的稳健框架仍然是一个挑战。在这里,我们使用遗传算法和深度学习框架开发了一种通用模型,用于设计具有特定 T 值的胶原蛋白序列。我们报告了 1000 条从头开始的胶原蛋白序列,并展示了我们可以使用该模型高效地生成胶原蛋白序列,并使用实验和计算方法验证它们的 T 值。我们发现该模型可以在几度的范围内准确预测 T 值。此外,我们还使用该模型进行了高通量研究,以确定最常出现的胶原蛋白三肽,这些三肽可以直接掺入胶原蛋白中。我们进一步发现,使用分子动力学(MD)计算的胶原蛋白内氢键的数量与实验测量的三螺旋质量直接相关。最终,我们认为这项工作是帮助研究人员开发具有特定 T 值的胶原蛋白序列的关键步骤,用于预期的材料制造方法和生物医学应用,实现基于机制的设计材料范例。