Suppr超能文献

在不共享结构的情况下开发协同定量构效关系模型。

Developing Collaborative QSAR Models Without Sharing Structures.

作者信息

Gedeck Peter, Skolnik Suzanne, Rodde Stephane

机构信息

Peter Gedeck LLC , 2309 Grove Avenue, Falls Church, Virginia 22046, United States.

Novartis Institute for Biomedical Research , 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

出版信息

J Chem Inf Model. 2017 Aug 28;57(8):1847-1858. doi: 10.1021/acs.jcim.7b00315. Epub 2017 Jul 25.

Abstract

It is widely understood that QSAR models greatly improve if more data are used. However, irrespective of model quality, once chemical structures diverge too far from the initial data set, the predictive performance of a model degrades quickly. To increase the applicability domain we need to increase the diversity of the training set. This can be achieved by combining data from diverse sources. Public data can be easily included; however, proprietary data may be more difficult to add due to intellectual property concerns. In this contribution, we will present a method for the collaborative development of linear regression models that addresses this problem. The method differs from other past approaches, because data are only shared in an aggregated form. This prohibits access to individual data points and therefore avoids the disclosure of confidential structural information. The final models are equivalent to models that were built with combined data sets.

摘要

人们普遍认为,如果使用更多数据,定量构效关系(QSAR)模型会有很大改进。然而,无论模型质量如何,一旦化学结构与初始数据集差异过大,模型的预测性能就会迅速下降。为了扩大适用范围,我们需要增加训练集的多样性。这可以通过合并来自不同来源的数据来实现。公共数据可以很容易地纳入;然而,由于知识产权问题,专有数据可能更难添加。在本论文中,我们将提出一种用于线性回归模型协同开发的方法,该方法能解决这个问题。该方法与以往的其他方法不同,因为数据仅以汇总形式共享。这禁止访问单个数据点,因此避免了机密结构信息的泄露。最终模型等同于使用合并数据集构建的模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验