Bell Eric W, Brown Benjamin P, Meiler Jens
bioRxiv. 2025 May 2:2025.02.27.640629. doi: 10.1101/2025.02.27.640629.
Non canonical amino acids (NCAAs) occupy an important place, both in natural biology and synthetic applications. However, modeling these amino acids still lies outside the capabilities of most deep learning methods due to sparse training datasets for this task. Instead, biophysical methods such as Rosetta can excel in modeling NCAAs. We discuss the various aspects of parameterizing a NCAA for use in Rosetta, identifying rotamer distribution modeling as one of the most impactful factors of NCAA parameterization on Rosetta performance. To this end, we also present FakeRotLib, a method which uses statistical fitting of small molecule conformer to create rotamer distributions. We find that FakeRotLib outperforms existing methods in a fraction of the time and is able to parameterize NCAA types previously unmodeled by Rosetta.
非标准氨基酸(NCAA)在自然生物学和合成应用中都占据着重要地位。然而,由于用于此任务的训练数据集稀疏,对这些氨基酸进行建模仍超出了大多数深度学习方法的能力范围。相反,诸如Rosetta之类的生物物理方法在对NCAA进行建模方面表现出色。我们讨论了在Rosetta中使用的NCAA参数化的各个方面,将旋转异构体分布建模确定为NCAA参数化对Rosetta性能影响最大的因素之一。为此,我们还提出了FakeRotLib,这是一种使用小分子构象统计拟合来创建旋转异构体分布的方法。我们发现,FakeRotLib在极短的时间内就优于现有方法,并且能够对Rosetta以前未建模的NCAA类型进行参数化。