Milano Chemometrics and QSAR Research Group, University of Milano Bicocca, P.za della Scienza 1, 20126 Milano, Italy.
Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8049 Zurich, Switzerland.
J Chem Inf Model. 2020 Mar 23;60(3):1215-1223. doi: 10.1021/acs.jcim.9b01057. Epub 2020 Mar 2.
Consensus strategies have been widely applied in many different scientific fields, based on the assumption that the fusion of several sources of information increases the outcome reliability. Despite the widespread application of consensus approaches, their advantages in quantitative structure-activity relationship (QSAR) modeling have not been thoroughly evaluated, mainly due to the lack of appropriate large-scale data sets. In this study, we evaluated the advantages and drawbacks of consensus approaches compared to single classification QSAR models. To this end, we used a data set of three properties (androgen receptor binding, agonism, and antagonism) for approximately 4000 molecules with predictions performed by more than 20 QSAR models, made available in a large-scale collaborative project. The individual QSAR models were compared with two consensus approaches, majority voting and the Bayes consensus with discrete probability distributions, in both protective and nonprotective forms. Consensus strategies proved to be more accurate and to better cover the analyzed chemical space than individual QSARs on average, thus motivating their widespread application for property prediction. Scripts and data to reproduce the results of this study are available for download.
共识策略已广泛应用于许多不同的科学领域,其假设是融合多个信息源可以提高结果的可靠性。尽管共识方法得到了广泛应用,但它们在定量构效关系 (QSAR) 建模中的优势尚未得到彻底评估,主要是因为缺乏适当的大规模数据集。在这项研究中,我们评估了共识方法相对于单一分类 QSAR 模型的优缺点。为此,我们使用了一个包含约 4000 个分子的数据集,该数据集具有三种性质(雄激素受体结合、激动作用和拮抗作用),并由一个大规模合作项目提供的 20 多个 QSAR 模型进行预测。在保护性和非保护性两种形式下,将个体 QSAR 模型与两种共识方法(多数投票和带有离散概率分布的贝叶斯共识)进行了比较。共识策略在平均水平上被证明比个体 QSAR 更准确,并且更好地覆盖了所分析的化学空间,因此它们广泛应用于属性预测。可用于重现本研究结果的脚本和数据可下载。