Eriksson Lennart, Andersson Patrik L, Johansson Erik, Tysklind Mats
Umetrics AB, POB 7960, S-907 19, Umeå, Sweden,
Mol Divers. 2006 May;10(2):169-86. doi: 10.1007/s11030-006-9024-6. Epub 2006 Jun 13.
This paper introduces principal component analysis (PCA), partial least squares projections to latent structures (PLS), and statistical molecular design (SMD) as useful tools in deriving multi- and megavariate quantitative structure-activity relationship (QSAR) models. Two QSAR data sets from the fields of environmental toxicology and environmental chemistry are worked out in detail, showing the benefits of PCA, PLS and SMD. PCA is useful when overviewing a data set and exploring relationships among compounds and relationships among variables. PLS is the regression extension of PCA and is used for establishing QSARs. SMD is essential for selecting informative training and test sets of compounds for QSAR calibration and validation.
本文介绍了主成分分析(PCA)、偏最小二乘判别分析(PLS)和统计分子设计(SMD),它们是推导多变量和超变量定量构效关系(QSAR)模型的有用工具工具。详细研究了来自环境毒理学和环境化学领域的两个QSAR数据集,展示了PCA、PLS和SMD的优势。PCA在概述数据集以及探索化合物之间的关系和变量之间的关系时很有用。PLS是PCA的回归扩展,用于建立QSAR。SMD对于选择用于QSAR校准和验证的信息丰富的化合物训练集和测试集至关重要。