• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Chi-MIC-share:一种用于定量构效关系模型的新特征选择算法。

Chi-MIC-share: a new feature selection algorithm for quantitative structure-activity relationship models.

作者信息

Li Yuting, Dai Zhijun, Cao Dan, Luo Feng, Chen Yuan, Yuan Zheming

机构信息

Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University 410128 China

School of Computing, Clemson University Clemson SC USA.

出版信息

RSC Adv. 2020 May 27;10(34):19852-19860. doi: 10.1039/d0ra00061b. eCollection 2020 May 26.

DOI:10.1039/d0ra00061b
PMID:35520405
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9054197/
Abstract

Quantitative structure-activity relationship models are used in toxicology to predict the effects of organic compounds on aquatic organisms. Common filter feature selection methods use correlation statistics to rank features, but this approach considers only the correlation between a single feature and the response variable and does not take into account feature redundancy. Although the minimal redundancy maximal relevance approach considers the redundancy among features, direct removal of the redundant features may result in loss of prediction accuracy, and cross-validation of training sets to select an optimal subset of features is time-consuming. In this paper, we describe the development of a feature selection method, Chi-MIC-share, which can terminate feature selection automatically and is based on an improved maximal information coefficient and a redundant allocation strategy. We validated Chi-MIC-share using three environmental toxicology datasets and a support vector regression model. The results show that Chi-MIC-share is more accurate than other feature selection methods. We also performed a significance test on the model and analyzed the single-factor effects of the reserved descriptors.

摘要

定量构效关系模型在毒理学中用于预测有机化合物对水生生物的影响。常见的过滤特征选择方法使用相关统计对特征进行排序,但这种方法仅考虑单个特征与响应变量之间的相关性,而没有考虑特征冗余。尽管最小冗余最大相关方法考虑了特征之间的冗余,但直接去除冗余特征可能会导致预测准确性的损失,并且对训练集进行交叉验证以选择最优特征子集非常耗时。在本文中,我们描述了一种特征选择方法Chi-MIC-share的开发,该方法基于改进的最大信息系数和冗余分配策略,可以自动终止特征选择。我们使用三个环境毒理学数据集和一个支持向量回归模型对Chi-MIC-share进行了验证。结果表明,Chi-MIC-share比其他特征选择方法更准确。我们还对模型进行了显著性检验,并分析了保留描述符的单因素效应。

相似文献

1
Chi-MIC-share: a new feature selection algorithm for quantitative structure-activity relationship models.Chi-MIC-share:一种用于定量构效关系模型的新特征选择算法。
RSC Adv. 2020 May 27;10(34):19852-19860. doi: 10.1039/d0ra00061b. eCollection 2020 May 26.
2
A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining.基于最大信息系数和 Gram-Schmidt 正交化的生物医学数据挖掘过滤特征选择方法。
Comput Biol Med. 2017 Oct 1;89:264-274. doi: 10.1016/j.compbiomed.2017.08.021. Epub 2017 Aug 24.
3
A new improved maximal relevance and minimal redundancy method based on feature subset.一种基于特征子集的新的改进的最大相关性和最小冗余方法。
J Supercomput. 2023;79(3):3157-3180. doi: 10.1007/s11227-022-04763-2. Epub 2022 Aug 30.
4
Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy.基于最大权重最小冗余的核偏最小二乘特征选择
Entropy (Basel). 2023 Feb 10;25(2):325. doi: 10.3390/e25020325.
5
McTwo: a two-step feature selection algorithm based on maximal information coefficient.McTwo:一种基于最大信息系数的两步特征选择算法。
BMC Bioinformatics. 2016 Mar 23;17:142. doi: 10.1186/s12859-016-0990-0.
6
Multi-task feature selection in microarray data by binary integer programming.通过二进制整数规划进行微阵列数据中的多任务特征选择
BMC Proc. 2013 Dec 20;7(Suppl 7):S5. doi: 10.1186/1753-6561-7-S7-S5.
7
Pre-processing feature selection for improved C&RT models for oral absorption.预处理特征选择可提高口服吸收的 C&RT 模型。
J Chem Inf Model. 2013 Oct 28;53(10):2730-42. doi: 10.1021/ci400378j. Epub 2013 Oct 9.
8
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.
9
A new feature selection algorithm based on relevance, redundancy and complementarity.一种基于相关性、冗余性和互补性的新特征选择算法。
Comput Biol Med. 2020 Apr;119:103667. doi: 10.1016/j.compbiomed.2020.103667. Epub 2020 Feb 19.
10
Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons.随机森林在 QSPR 模型中的特征选择 - 预测碳氢化合物标准生成焓的应用。
J Cheminform. 2013 Feb 11;5(1):9. doi: 10.1186/1758-2946-5-9.

引用本文的文献

1
The value of CCTA combined with machine learning for predicting angina pectoris in the anomalous origin of the right coronary artery.CCTA 联合机器学习对右冠状动脉起源异常患者心绞痛的预测价值。
Biomed Eng Online. 2024 Sep 12;23(1):95. doi: 10.1186/s12938-024-01286-0.
2
Transformer-Based Multi-Modal Data Fusion Method for COPD Classification and Physiological and Biochemical Indicators Identification.基于Transformer的慢性阻塞性肺疾病分类及生理生化指标识别的多模态数据融合方法
Biomolecules. 2023 Sep 15;13(9):1391. doi: 10.3390/biom13091391.
3
iSuc-ChiDT: a computational method for identifying succinylation sites using statistical difference table encoding and the chi-square decision table classifier.

本文引用的文献

1
The Combined QSAR-ICE Models: Practical Application in Ecological Risk Assessment and Water Quality Criteria.联合定量构效关系-综合生物效应模型:在生态风险评估和水质标准中的实际应用
Environ Sci Technol. 2017 Aug 15;51(16):8877-8878. doi: 10.1021/acs.est.7b02736. Epub 2017 Jul 24.
2
A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives.一种用于硫脲衍生物抗丙型肝炎病毒活性的高维QSAR分类模型最优描述符选择的新型自适应L1范数。
SAR QSAR Environ Res. 2017 Jan;28(1):75-90. doi: 10.1080/1062936X.2017.1278618.
3
Best Practices for QSAR Model Development, Validation, and Exploitation.
iSuc-ChiDT:一种使用统计差异表编码和卡方决策表分类器识别琥珀酰化位点的计算方法。
BioData Min. 2022 Feb 10;15(1):3. doi: 10.1186/s13040-022-00290-1.
4
Application 2D Descriptors and Artificial Neural Networks for Beta-Glucosidase Inhibitors Screening.应用二维描述符和人工神经网络筛选β-葡萄糖苷酶抑制剂。
Molecules. 2020 Dec 15;25(24):5942. doi: 10.3390/molecules25245942.
定量构效关系(QSAR)模型开发、验证及应用的最佳实践
Mol Inform. 2010 Jul 12;29(6-7):476-88. doi: 10.1002/minf.201000061. Epub 2010 Jul 6.
4
A New Algorithm to Optimize Maximal Information Coefficient.一种优化最大信息系数的新算法。
PLoS One. 2016 Jun 22;11(6):e0157567. doi: 10.1371/journal.pone.0157567. eCollection 2016.
5
CORAL: model for no observed adverse effect level (NOAEL).CORAL:未观察到有害作用水平(NOAEL)模型。
Mol Divers. 2015 Aug;19(3):563-75. doi: 10.1007/s11030-015-9587-1. Epub 2015 Apr 8.
6
Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions.基于成对基因相互作用的卡方检验进行肿瘤的信息基因选择与直接分类。
Biomed Res Int. 2014;2014:589290. doi: 10.1155/2014/589290. Epub 2014 Jul 23.
7
A pipeline for improved QSAR analysis of peptides: physiochemical property parameter selection via BMSF, near-neighbor sample selection via semivariogram, and weighted SVR regression and prediction.一种用于改进肽的定量构效关系(QSAR)分析的流程:通过BMSF选择物理化学性质参数,通过半变异函数选择近邻样本,以及加权支持向量回归(SVR)和预测。
Amino Acids. 2014 Apr;46(4):1105-19. doi: 10.1007/s00726-014-1667-5. Epub 2014 Jan 28.
8
Quantitative sequence-activity model analysis of oligopeptides coupling an improved high-dimension feature selection method with support vector regression.结合改进的高维特征选择方法与支持向量回归的寡肽定量序列-活性模型分析
Chem Biol Drug Des. 2014 Apr;83(4):379-91. doi: 10.1111/cbdd.12242. Epub 2014 Feb 18.
9
QSAR workbench: automating QSAR modeling to drive compound design.QSAR 工作平台:自动化 QSAR 建模以推动化合物设计。
J Comput Aided Mol Des. 2013 Apr;27(4):321-36. doi: 10.1007/s10822-013-9648-4. Epub 2013 Apr 25.
10
Detecting novel associations in large data sets.在大型数据集 中检测新的关联。
Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.