• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

集成特征选择:多个定量构效关系模型的一致描述符子集

Ensemble feature selection: consistent descriptor subsets for multiple QSAR models.

作者信息

Dutta Debojyoti, Guha Rajarshi, Wild David, Chen Ting

机构信息

School of Informatics, Indiana University, Bloomington, Indiana 47406, USA.

出版信息

J Chem Inf Model. 2007 May-Jun;47(3):989-97. doi: 10.1021/ci600563w. Epub 2007 Apr 4.

DOI:10.1021/ci600563w
PMID:17407280
Abstract

Selecting a small subset of descriptors from a large pool to build a predictive quantitative structure-activity relationship (QSAR) model is an important step in the QSAR modeling process. In general, subset selection is very hard to solve, even approximately, with guaranteed performance bounds. Traditional approaches employ deterministic or stochastic methods to obtain a descriptor subset that leads to an optimal model of a single type (such as linear regression or a neural network). With the development of ensemble modeling approaches, multiple models of differing types are individually developed resulting in different descriptor subsets for each model type. However, it is advantageous, from the point of view of developing interpretable QSAR models, to have a single set of descriptors that can be used for different model types. In this paper, we describe an approach to the selection of a single, optimal, subset of descriptors for multiple model types. We apply this approach to three data sets, covering both regression and classification, and show that the constraint of forcing different model types to use the same set of descriptors does not lead to a significant loss in predictive ability for the individual models considered. In addition, interpretations of the individual models developed using this approach indicate that they encode similar structure-activity trends.

摘要

从大量描述符中选择一小部分来构建预测性定量构效关系(QSAR)模型是QSAR建模过程中的重要一步。一般来说,子集选择很难解决,即使是近似解决,也难以保证性能界限。传统方法采用确定性或随机方法来获得一个描述符子集,从而得到单一类型的最优模型(如线性回归或神经网络)。随着集成建模方法的发展,不同类型的多个模型被分别开发,导致每种模型类型都有不同的描述符子集。然而,从开发可解释的QSAR模型的角度来看,拥有一组可用于不同模型类型的描述符是有利的。在本文中,我们描述了一种为多种模型类型选择单个最优描述符子集的方法。我们将此方法应用于三个数据集,涵盖回归和分类,并表明迫使不同模型类型使用同一组描述符的约束不会导致所考虑的单个模型的预测能力显著损失。此外,使用此方法开发的单个模型的解释表明,它们编码了相似的构效趋势。

相似文献

1
Ensemble feature selection: consistent descriptor subsets for multiple QSAR models.集成特征选择:多个定量构效关系模型的一致描述符子集
J Chem Inf Model. 2007 May-Jun;47(3):989-97. doi: 10.1021/ci600563w. Epub 2007 Apr 4.
2
Development of linear, ensemble, and nonlinear models for the prediction and interpretation of the biological activity of a set of PDGFR inhibitors.开发用于预测和解释一组血小板衍生生长因子受体(PDGFR)抑制剂生物活性的线性、集成和非线性模型。
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2179-89. doi: 10.1021/ci049849f.
3
Molecule kernels: a descriptor- and alignment-free quantitative structure-activity relationship approach.分子内核:一种无描述符和比对的定量构效关系方法。
J Chem Inf Model. 2008 Sep;48(9):1868-81. doi: 10.1021/ci800144y. Epub 2008 Sep 4.
4
Combinatorial QSAR modeling of P-glycoprotein substrates.P-糖蛋白底物的组合定量构效关系建模
J Chem Inf Model. 2006 May-Jun;46(3):1245-54. doi: 10.1021/ci0504317.
5
Quantitative structure-activity relationship modeling of juvenile hormone mimetic compounds for Culex pipiens larvae, with a discussion of descriptor-thinning methods.致倦库蚊幼虫保幼激素模拟化合物的定量构效关系建模及描述符精简方法探讨
J Chem Inf Model. 2006 Jan-Feb;46(1):65-77. doi: 10.1021/ci050215y.
6
Stochastic versus stepwise strategies for quantitative structure-activity relationship generation--how much effort may the mining for successful QSAR models take?定量构效关系生成的随机策略与逐步策略——挖掘成功的定量构效关系模型需要付出多少努力?
J Chem Inf Model. 2007 May-Jun;47(3):927-39. doi: 10.1021/ci600476r. Epub 2007 May 5.
7
Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression.使用支持向量机、径向基函数神经网络和多元线性回归对定量构效关系/定量结构性质关系相关性进行的比较研究。
J Chem Inf Comput Sci. 2004 Jul-Aug;44(4):1257-66. doi: 10.1021/ci049965i.
8
Predicting the genotoxicity of secondary and aromatic amines using data subsetting to generate a model ensemble.利用数据子集生成模型集成来预测仲胺和芳香胺的遗传毒性。
J Chem Inf Comput Sci. 2003 May-Jun;43(3):949-63. doi: 10.1021/ci034013i.
9
Toward generating simpler QSAR models: nonlinear multivariate regression versus several neural network ensembles and some related methods.迈向生成更简单的定量构效关系模型:非线性多元回归与几种神经网络集成及一些相关方法
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1094-102. doi: 10.1021/ci025636j.
10
A comparison of methods for modeling quantitative structure-activity relationships.定量构效关系建模方法的比较
J Med Chem. 2004 Oct 21;47(22):5541-54. doi: 10.1021/jm0497141.

引用本文的文献

1
Screening, Synthesis, and QSAR Research on Cinnamaldehyde-Amino Acid Schiff Base Compounds as Antibacterial Agents.肉桂醛-氨基酸席夫碱类化合物的抑菌活性筛选、合成及定量构效关系研究。
Molecules. 2018 Nov 20;23(11):3027. doi: 10.3390/molecules23113027.
2
In Silico HCT116 Human Colon Cancer Cell-Based Models En Route to the Discovery of Lead-Like Anticancer Drugs.基于人结肠癌细胞的 HCT116 细胞模型的计算机辅助药物筛选——旨在发现类似先导的抗癌药物。
Biomolecules. 2018 Jul 17;8(3):56. doi: 10.3390/biom8030056.
3
QSAR-assisted virtual screening of lead-like molecules from marine and microbial natural sources for antitumor and antibiotic drug discovery.
基于定量构效关系辅助的虚拟筛选,从海洋和微生物天然来源中寻找类先导化合物用于抗肿瘤和抗生素药物发现。
Molecules. 2015 Mar 17;20(3):4848-73. doi: 10.3390/molecules20034848.
4
A chemoinformatics approach to the discovery of lead-like molecules from marine and microbial sources en route to antitumor and antibiotic drugs.一种化学生信方法,用于从海洋和微生物资源中发现类似先导的分子,以用于抗肿瘤和抗生素药物的研发。
Mar Drugs. 2014 Jan 27;12(2):757-78. doi: 10.3390/md12020757.
5
THE INTERACTIVE DECISION COMMITTEE FOR CHEMICAL TOXICITY ANALYSIS.化学毒性分析交互式决策委员会
J Stat Res. 2012;46(2):157-186.
6
Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons.随机森林在 QSPR 模型中的特征选择 - 预测碳氢化合物标准生成焓的应用。
J Cheminform. 2013 Feb 11;5(1):9. doi: 10.1186/1758-2946-5-9.
7
In silico approach to screen compounds active against parasitic nematodes of major socio-economic importance.计算机筛选方法筛选针对主要社会经济重要性寄生虫线虫的活性化合物。
BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S25. doi: 10.1186/1471-2105-12-S13-S25. Epub 2011 Nov 30.
8
On the interpretation and interpretability of quantitative structure-activity relationship models.关于定量构效关系模型的解释与可解释性
J Comput Aided Mol Des. 2008 Dec;22(12):857-71. doi: 10.1007/s10822-008-9240-5. Epub 2008 Sep 11.
9
Considerations and recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction.用于细胞色素P450介导的药物代谢预测的定量构效关系(QSAR)模型的考量因素及最新进展。
J Comput Aided Mol Des. 2008 Nov;22(11):843-55. doi: 10.1007/s10822-008-9225-4. Epub 2008 Jun 24.
10
Asymmetric bagging and feature selection for activities prediction of drug molecules.用于药物分子活性预测的非对称装袋法和特征选择
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S7. doi: 10.1186/1471-2105-9-S6-S7.