• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据集之间的典型相关性度量(CMC)和典型距离度量(CMD) 第2部分。变量约简。

Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data Part 2. Variable reduction.

作者信息

Consonni V, Ballabio D, Manganaro A, Mauri A, Todeschini R

机构信息

Milano Chemometrics and QSAR Research Group, Department of Environmental Sciences, University of Milano-Bicocca, I-20126 Milano, Italy.

出版信息

Anal Chim Acta. 2009 Aug 19;648(1):52-9. doi: 10.1016/j.aca.2009.06.035. Epub 2009 Jun 21.

DOI:10.1016/j.aca.2009.06.035
PMID:19616689
Abstract

This paper proposes a new method for determining the subset of variables that reproduce as well as possible the main structural features of the complete data set. This method can be useful for pre-treatment of large data sets since it allows discarding variables that contain redundant information. Reducing the number of variables often allows one to better investigate data structure and obtain more stable results from multivariate modelling methods. The novel method is based on the recently proposed canonical measure of correlation (CMC index) between two sets of variables [R. Todeschini, V. Consonni, A. Manganaro, D. Ballabio, A. Mauri, Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications, Anal. Chim. Acta submitted for publication (2009)]. Following a stepwise procedure (backward elimination), each variable in turn is compared to all the other variables and the most correlated is definitively discarded. Finally, a key subset of variables being as orthogonal as possible are selected. The performance was evaluated on both simulated and real data sets. The effectiveness of the novel method is discussed by comparison with results of other well known methods for variable reduction, such as Jolliffe techniques, McCabe criteria, Krzanowski approach and its modification based on genetic algorithms, loadings of the first principal component, Key Set Factor Analysis (KSFA), Variable Inflation Factor (VIF), pairwise correlation approach, and K correlation analysis (KIF). The obtained results are consistent with those of the other considered methods; moreover, the advantage of the proposed CMC method is that calculation is very quick and can be easily implemented in any software application.

摘要

本文提出了一种新方法,用于确定能够尽可能重现完整数据集主要结构特征的变量子集。该方法对于大型数据集的预处理可能很有用,因为它允许丢弃包含冗余信息的变量。减少变量数量通常能让人更好地研究数据结构,并从多元建模方法中获得更稳定的结果。这种新方法基于最近提出的两组变量之间的典型相关度量(CMC指数)[R.托德斯基尼、V.孔索尼、A.曼加纳罗、D.巴拉比奥、A.毛里,数据组之间的典型相关度量(CMC)和典型距离度量(CMD)。第1部分。理论与简单的化学计量学应用,《分析化学学报》已提交发表(2009年)]。按照逐步程序(向后消除),依次将每个变量与所有其他变量进行比较,最终丢弃相关性最高的变量。最后,选择尽可能正交的关键变量子集。在模拟数据集和真实数据集上都对性能进行了评估。通过与其他著名的变量约简方法的结果进行比较,讨论了这种新方法的有效性,这些方法包括乔利夫技术、麦凯布准则、克扎诺夫斯基方法及其基于遗传算法的改进、第一主成分的载荷、关键集因子分析(KSFA)、方差膨胀因子(VIF)、成对相关方法和K相关分析(KIF)。获得的结果与其他所考虑方法的结果一致;此外,所提出的CMC方法的优点是计算非常快速,可以很容易地在任何软件应用程序中实现。

相似文献

1
Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data Part 2. Variable reduction.数据集之间的典型相关性度量(CMC)和典型距离度量(CMD) 第2部分。变量约简。
Anal Chim Acta. 2009 Aug 19;648(1):52-9. doi: 10.1016/j.aca.2009.06.035. Epub 2009 Jun 21.
2
Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications.数据集之间的典型相关性度量(CMC)和典型距离度量(CMD)。第1部分。理论与简单的化学计量学应用。
Anal Chim Acta. 2009 Aug 19;648(1):45-51. doi: 10.1016/j.aca.2009.06.032. Epub 2009 Jun 21.
3
Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 3. Variable selection in classification.数据集合的典型相关度量(CMC)和典型距离度量(CMD)。第 3 部分。分类中的变量选择。
Anal Chim Acta. 2010 Jan 11;657(2):116-22. doi: 10.1016/j.aca.2009.10.033.
4
PCA in studying coordination and variability: a tutorial.主成分分析在研究协调性和变异性中的应用:教程
Clin Biomech (Bristol). 2004 May;19(4):415-28. doi: 10.1016/j.clinbiomech.2004.01.005.
5
Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis.多标签分类的典范相关分析:最小二乘法公式、扩展及分析。
IEEE Trans Pattern Anal Mach Intell. 2011 Jan;33(1):194-200. doi: 10.1109/TPAMI.2010.160.
6
Extension of quadrature orthogonal signal corrected two-dimensional (QOSC 2D) correlation spectroscopy I: principal component analysis based QOSC 2D.正交信号校正二维(QOSC 2D)相关光谱的扩展I:基于主成分分析的QOSC 2D
Appl Spectrosc. 2007 Oct;61(10):1040-4. doi: 10.1366/000370207782217761.
7
A segmented principal component analysis-regression approach to quantitative structure-activity relationship modeling.一种用于定量构效关系建模的分段主成分分析-回归方法。
Anal Chim Acta. 2009 Jul 30;646(1-2):30-8. doi: 10.1016/j.aca.2009.05.003. Epub 2009 May 9.
8
Generalized covariance-adjusted canonical correlation analysis with application to psychiatry.广义协方差调整典型相关分析及其在精神病学中的应用。
Stat Med. 2003 Feb 28;22(4):595-610. doi: 10.1002/sim.1332.
9
Estimating the polychoric correlation from misclassified data.从错误分类数据中估计多列相关系数。
Br J Math Stat Psychol. 2008 May;61(Pt 1):49-74. doi: 10.1348/000711006X131136.
10
Quadrature orthogonal signal corrected two-dimensional correlation spectroscopy.正交信号校正二维相关光谱法
Appl Spectrosc. 2006 Jun;60(6):605-10. doi: 10.1366/000370206777670657.