Toropova Alla P, Raškova Maria, Raška Ivan, Toropov Andrey A
Department of Environmental Health Science, Laboratory of Environmental Chemistry and Toxicology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milan, Italy.
3Rd Medical Department, 1st Faculty of Medicine, Charles University in Prague, U Nemocnice 1, 12808 Prague 2, Czech Republic.
Theor Chem Acc. 2021;140(2):15. doi: 10.1007/s00214-020-02707-8. Epub 2021 Jan 22.
The algorithm of building up a model for the biological activity of peptides as a mathematical function of a sequence of amino acids is suggested. The general scheme is the following: The total set of available data is distributed into the active training set, passive training set, calibration set, and validation set. The training (both active and passive) and calibration sets are a system of generation of a model of biological activity where each amino acid obtains special correlation weight. The numerical data on the correlation weights calculated by the Monte Carlo method using the CORAL software (http://www.insilico.eu/coral). The target function aimed to give the best result for the calibration set (not for the training set). The final checkup of the model is carried out with data on the validation set (peptides, which are not visible during the creation of the model). Described computational experiments confirm the ability of the approach to be a tool for the design of predictive models for the biological activity of peptides (expressed by pIC50).
提出了一种将肽的生物活性建模为氨基酸序列数学函数的算法。总体方案如下:可用数据的总集被分为活性训练集、被动训练集、校准集和验证集。训练集(包括活性和被动)和校准集是一个生成生物活性模型的系统,其中每个氨基酸都获得特殊的相关权重。使用CORAL软件(http://www.insilico.eu/coral)通过蒙特卡罗方法计算相关权重的数值数据。目标函数旨在为校准集(而非训练集)给出最佳结果。使用验证集(在模型创建过程中不可见的肽)的数据对模型进行最终检查。所描述的计算实验证实了该方法作为设计肽生物活性预测模型(以pIC50表示)工具的能力。