Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy.
Chem Biol Drug Des. 2011 May;77(5):343-60. doi: 10.1111/j.1747-0285.2011.01109.x. Epub 2011 Mar 25.
The simplified molecular input-line entry system (SMILES) and IUPAC International Chemical Identifier (InChI) were examined as representations of the molecular structure for quantitative structure-activity relationships (QSAR), which can be used to predict the inhibitory activity of styrylquinoline derivatives against the human immunodeficiency virus type 1 (HIV-1). Optimal SMILES-based descriptors give a best model with n = 26, r(2) = 0.6330, q(2) = 0.5812, s = 0.502, F = 41 for the training set and n = 10, r(2) = 0.7493, r(pred)(2) = 0.6235, R(m)(2) = 0.537, s = 0.541, F = 24 for the validation set. Optimal InChI-based descriptors give a best model with n = 26, r(2) = 0.8673, q(2) = 0.8456, s = 0.302, F = 157 for the training set and n = 10, r(2) = 0.8562, r(pred)(2) = 0.7715, R(m)(2) = 0.819, s = 0.329, F = 48 for the validation set. Thus, the InChI-based model is preferable. The described SMILES-based and InChI-based approaches have been checked with five random splits into the training and test sets.
简化分子线性输入规范(SMILES)和国际化学标识符(InChI)被检查作为定量构效关系(QSAR)的分子结构表示,可用于预测苯乙烯喹啉衍生物对人类免疫缺陷病毒 1(HIV-1)的抑制活性。最佳基于 SMILES 的描述符给出了一个最佳模型,n = 26,r(2) = 0.6330,q(2) = 0.5812,s = 0.502,F = 41,用于训练集和 n = 10,r(2) = 0.7493,r(pred)(2) = 0.6235,R(m)(2) = 0.537,s = 0.541,F = 24,用于验证集。最佳基于 InChI 的描述符给出了一个最佳模型,n = 26,r(2) = 0.8673,q(2) = 0.8456,s = 0.302,F = 157,用于训练集和 n = 10,r(2) = 0.8562,r(pred)(2) = 0.7715,R(m)(2) = 0.819,s = 0.329,F = 48,用于验证集。因此,基于 InChI 的模型是优选的。已经使用五个随机拆分训练集和测试集检查了基于 SMILES 和 InChI 的方法。