• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在不共享结构的情况下开发协同定量构效关系模型。

Developing Collaborative QSAR Models Without Sharing Structures.

作者信息

Gedeck Peter, Skolnik Suzanne, Rodde Stephane

机构信息

Peter Gedeck LLC , 2309 Grove Avenue, Falls Church, Virginia 22046, United States.

Novartis Institute for Biomedical Research , 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

出版信息

J Chem Inf Model. 2017 Aug 28;57(8):1847-1858. doi: 10.1021/acs.jcim.7b00315. Epub 2017 Jul 25.

DOI:10.1021/acs.jcim.7b00315
PMID:28723087
Abstract

It is widely understood that QSAR models greatly improve if more data are used. However, irrespective of model quality, once chemical structures diverge too far from the initial data set, the predictive performance of a model degrades quickly. To increase the applicability domain we need to increase the diversity of the training set. This can be achieved by combining data from diverse sources. Public data can be easily included; however, proprietary data may be more difficult to add due to intellectual property concerns. In this contribution, we will present a method for the collaborative development of linear regression models that addresses this problem. The method differs from other past approaches, because data are only shared in an aggregated form. This prohibits access to individual data points and therefore avoids the disclosure of confidential structural information. The final models are equivalent to models that were built with combined data sets.

摘要

人们普遍认为,如果使用更多数据,定量构效关系(QSAR)模型会有很大改进。然而,无论模型质量如何,一旦化学结构与初始数据集差异过大,模型的预测性能就会迅速下降。为了扩大适用范围,我们需要增加训练集的多样性。这可以通过合并来自不同来源的数据来实现。公共数据可以很容易地纳入;然而,由于知识产权问题,专有数据可能更难添加。在本论文中,我们将提出一种用于线性回归模型协同开发的方法,该方法能解决这个问题。该方法与以往的其他方法不同,因为数据仅以汇总形式共享。这禁止访问单个数据点,因此避免了机密结构信息的泄露。最终模型等同于使用合并数据集构建的模型。

相似文献

1
Developing Collaborative QSAR Models Without Sharing Structures.在不共享结构的情况下开发协同定量构效关系模型。
J Chem Inf Model. 2017 Aug 28;57(8):1847-1858. doi: 10.1021/acs.jcim.7b00315. Epub 2017 Jul 25.
2
The project data sphere initiative: accelerating cancer research by sharing data.项目数据领域计划:通过数据共享加速癌症研究
Oncologist. 2015 May;20(5):464-e20. doi: 10.1634/theoncologist.2014-0431. Epub 2015 Apr 15.
3
Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis.针对梨形四膜虫测试的化学毒物的组合定量构效关系建模。
J Chem Inf Model. 2008 Apr;48(4):766-84. doi: 10.1021/ci700443v. Epub 2008 Mar 1.
4
A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models.一种新型自动化惰性学习定量构效关系(ALL-QSAR)方法:方法开发、应用以及使用经过验证的ALL-QSAR模型对化学数据库进行虚拟筛选。
J Chem Inf Model. 2006 Sep-Oct;46(5):1984-95. doi: 10.1021/ci060132x.
5
Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies.协同 Profile-QSAR:在竞争公司之间构建协同模型的自然平台。
J Chem Inf Model. 2021 Apr 26;61(4):1603-1616. doi: 10.1021/acs.jcim.0c01342. Epub 2021 Apr 12.
6
Exploring the QSAR's predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets.探索新型N元组离散导数指标在基准数据集上的定量构效关系(QSAR)预测真实性。
SAR QSAR Environ Res. 2017 May;28(5):367-389. doi: 10.1080/1062936X.2017.1326403.
7
Structural similarity based kriging for quantitative structure activity and property relationship modeling.基于结构相似性的克里金法用于定量构效关系和性质关系建模。
J Chem Inf Model. 2014 Jul 28;54(7):1833-49. doi: 10.1021/ci500110v. Epub 2014 Jun 25.
8
Comparison of MLR, PLS and GA-MLR in QSAR analysis.多元线性回归(MLR)、偏最小二乘法(PLS)和遗传算法-多元线性回归(GA-MLR)在定量构效关系(QSAR)分析中的比较。
SAR QSAR Environ Res. 2003 Oct-Dec;14(5-6):433-45. doi: 10.1080/10629360310001624015.
9
The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity.域适用性指标在估计 QSAR 预测误差方面的相对重要性随训练集多样性而变化。
J Chem Inf Model. 2015 Jun 22;55(6):1098-107. doi: 10.1021/acs.jcim.5b00110. Epub 2015 Jun 4.
10
Combinatorial QSAR of ambergris fragrance compounds.龙涎香香料化合物的组合定量构效关系
J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):582-95. doi: 10.1021/ci034203t.

引用本文的文献

1
Novel tricycle expanded purine nucleosides with pan-viral activity.具有泛病毒活性的新型三轮扩展嘌呤核苷。
Bioorg Med Chem. 2025 Dec 1;130:118384. doi: 10.1016/j.bmc.2025.118384. Epub 2025 Sep 8.
2
Structure-Activity relationships of replacements for the triazolopyridazine of Anti-Cryptosporidium lead SLU-2633.抗隐孢子虫先导化合物 SLU-2633 的三唑并嘧啶替换物的构效关系。
Bioorg Med Chem. 2023 May 15;86:117295. doi: 10.1016/j.bmc.2023.117295. Epub 2023 Apr 28.
3
In silico toxicology: From structure-activity relationships towards deep learning and adverse outcome pathways.
计算机毒理学:从构效关系到深度学习及不良结局途径。
Wiley Interdiscip Rev Comput Mol Sci. 2020 Jul-Aug;10(4):e1475. doi: 10.1002/wcms.1475. Epub 2020 Mar 31.
4
A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge.亲脂性定量构效关系的分子描述符比较与 SAMPL6 logP 预测挑战的结果。
J Comput Aided Mol Des. 2020 May;34(5):523-534. doi: 10.1007/s10822-020-00279-0. Epub 2020 Jan 13.