Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
Nat Commun. 2022 Aug 4;13(1):4541. doi: 10.1038/s41467-022-31955-4.
In vitro selection queries large combinatorial libraries for sequence-defined polymers with target binding and reaction catalysis activity. While the total sequence space of these libraries can extend beyond 10 sequences, practical considerations limit starting sequences to ≤~10 distinct molecules. Selection-induced sequence convergence and limited sequencing depth further constrain experimentally observable sequence space. To address these limitations, we integrate experimental and machine learning approaches to explore regions of sequence space unrelated to experimentally derived variants. We perform in vitro selections to discover highly side-chain-functionalized nucleic acid polymers (HFNAPs) with potent affinities for a target small molecule (daunomycin K = 5-65 nM). We then use the selection data to train a conditional variational autoencoder (CVAE) machine learning model to generate diverse and unique HFNAP sequences with high daunomycin affinities (K = 9-26 nM), even though they are unrelated in sequence to experimental polymers. Coupling in vitro selection with a machine learning model thus enables direct generation of active variants, demonstrating a new approach to the discovery of functional biopolymers.
体外选择从具有目标结合和反应催化活性的序列定义聚合物的大型组合文库中查询序列。虽然这些文库的总序列空间可以扩展到超过 10 个序列,但实际考虑因素将起始序列限制在≤~10 个不同的分子。选择诱导的序列收敛和有限的测序深度进一步限制了实验可观察到的序列空间。为了解决这些限制,我们整合了实验和机器学习方法来探索与实验衍生变体无关的序列空间区域。我们进行体外选择以发现具有强靶小分子(柔红霉素 K=5-65 nM)亲和力的高度侧链官能化核酸聚合物(HFNAP)。然后,我们使用选择数据来训练条件变分自动编码器(CVAE)机器学习模型,以生成具有高柔红霉素亲和力(K=9-26 nM)的多样化和独特的 HFNAP 序列,即使它们在序列上与实验聚合物无关。因此,将体外选择与机器学习模型相结合可以直接生成活性变体,展示了发现功能生物聚合物的新方法。