van Teijlingen Alexander, Tuttle Tell
Department of Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow G1 1XL, U.K.
J Chem Theory Comput. 2021 May 11;17(5):3221-3232. doi: 10.1021/acs.jctc.1c00159. Epub 2021 Apr 27.
Self-assembling peptide nanostructures have been shown to be of great importance in nature and have presented many promising applications, for example, in medicine as drug-delivery vehicles, biosensors, and antivirals. Being very promising candidates for the growing field of bottom-up manufacture of functional nanomaterials, previous work (Frederix, et al. 2011 and 2015) has screened all possible amino acid combinations for di- and tripeptides in search of such materials. However, the enormous complexity and variety of linear combinations of the 20 amino acids make exhaustive simulation of all combinations of tetrapeptides and above infeasible. Therefore, we have developed an active machine-learning method (also known as "iterative learning" and "evolutionary search method") which leverages a lower-resolution data set encompassing the whole search space and a just-in-time high-resolution data set which further analyzes those target peptides selected by the lower-resolution model. This model uses newly generated data upon each iteration to improve both lower- and higher-resolution models in the search for ideal candidates. Curation of the lower-resolution data set is explored as a method to control the selected candidates, based on criteria such as log . A major aim of this method is to produce the best results in the least computationally demanding way. This model has been developed to be broadly applicable to other search spaces with minor changes to the algorithm, allowing its use in other areas of research.
自组装肽纳米结构在自然界中已被证明具有重要意义,并展现出许多有前景的应用,例如在医学领域用作药物递送载体、生物传感器和抗病毒剂。作为自下而上制造功能性纳米材料这一不断发展的领域中非常有前景的候选者,先前的工作(弗雷德里克斯等人,2011年和2015年)已经筛选了二肽和三肽的所有可能氨基酸组合以寻找此类材料。然而,20种氨基酸的线性组合具有极大的复杂性和多样性,使得对四肽及以上所有组合进行详尽模拟变得不可行。因此,我们开发了一种主动机器学习方法(也称为“迭代学习”和“进化搜索方法”),该方法利用涵盖整个搜索空间的低分辨率数据集和即时高分辨率数据集,后者进一步分析由低分辨率模型选择的那些目标肽。该模型在每次迭代时使用新生成的数据来改进低分辨率和高分辨率模型,以寻找理想的候选者。基于诸如对数等标准,探索对低分辨率数据集进行筛选作为控制所选候选者的一种方法。此方法的一个主要目标是以计算要求最低的方式产生最佳结果。该模型经过开发,只需对算法进行微小更改就可广泛应用于其他搜索空间,从而可用于其他研究领域。