一种用于连续变量的非线性数据挖掘参数选择算法。

A non-linear data mining parameter selection algorithm for continuous variables.

作者信息

Tavallali Peyman, Razavi Marianne, Brady Sean

机构信息

Division of Engineering and Applied Sciences, California Institute of Technology, Pasadena, California, United States of America.

Principium Consulting, LLC, Pasadena, California, United States of America.

出版信息

PLoS One. 2017 Nov 13;12(11):e0187676. doi: 10.1371/journal.pone.0187676. eCollection 2017.

DOI:10.1371/journal.pone.0187676

PMID:29131829

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5683644/

Abstract

In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables.

摘要

在本文中，我们提出了一种新的数据挖掘算法，通过该算法，既能捕捉数据中的非线性特征，又能找到最佳子集模型。为了生成原始变量的增强子集，一种理想的选择方法应具备增加补充回归分析层次的潜力，即通过对预测变量进行数学变换以及探索组合变量的协同效应来捕捉数据中的复杂关系。我们在此提出的方法有潜力生成变量的最优子集，从而使模型选择的整个过程更高效。该算法通过对原始输入进行变换引入可解释的参数，并且能忠实拟合数据。本文的核心目标是为经典最小二乘回归框架引入一种新的估计技术。这种新的自动变量变换和模型选择方法能够提供一个最优且稳定的模型，该模型能使均方误差和变异性最小化，同时将所有可能的子集选择方法与包含变量变换和相互作用相结合。此外，该方法能控制多重共线性，从而得到一组最优的解释变量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59b5/5683644/66fddb4f0d02/pone.0187676.g001.jpg

相似文献

A non-linear data mining parameter selection algorithm for continuous variables.一种用于连续变量的非线性数据挖掘参数选择算法。

PLoS One. 2017 Nov 13;12(11):e0187676. doi: 10.1371/journal.pone.0187676. eCollection 2017.

Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization.使用连续优化的线性降维模型的最佳子集解路径

Biom J. 2025 Feb;67(1):e70015. doi: 10.1002/bimj.70015.

[Near-infrared spectra combining with CARS and SPA algorithms to screen the variables and samples for quantitatively determining the soluble solids content in strawberry].[近红外光谱结合CARS和SPA算法筛选变量与样本用于定量测定草莓可溶性固形物含量]

Guang Pu Xue Yu Guang Pu Fen Xi. 2015 Feb;35(2):372-8.

Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology.预测心肺移植患者的移植物存活率：一种综合数据挖掘方法。

Int J Med Inform. 2009 Dec;78(12):e84-96. doi: 10.1016/j.ijmedinf.2009.04.007. Epub 2009 Jun 3.

A variable selection method based on mutual information and variance inflation factor.基于互信息和方差膨胀因子的变量选择方法。

Spectrochim Acta A Mol Biomol Spectrosc. 2022 Mar 5;268:120652. doi: 10.1016/j.saa.2021.120652. Epub 2021 Nov 20.

Automatic kernel regression modelling using combined leave-one-out test score and regularised orthogonal least squares.使用留一法检验分数与正则化正交最小二乘法相结合的自动核回归建模

Int J Neural Syst. 2004 Feb;14(1):27-37. doi: 10.1142/S0129065704001875.

A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method.一种使用新型噪声约束最小二乘法的 AR 参数估计快速算法。

Neural Netw. 2010 Apr;23(3):396-405. doi: 10.1016/j.neunet.2009.11.004. Epub 2009 Dec 11.

Two-Stage Orthogonal Least Squares Methods for Neural Network Construction.两阶段正交最小二乘法神经网络构建方法。

IEEE Trans Neural Netw Learn Syst. 2015 Aug;26(8):1608-21. doi: 10.1109/TNNLS.2014.2346399. Epub 2014 Sep 11.

Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.通过优化一致性指数和稳定性选择提高稀疏生存模型的判别能力。

BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.基于偏差残差的稀疏偏最小二乘和稀疏核偏最小二乘回归用于删失数据。

Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.

引用本文的文献

Predicting the Impact of Climate Change on Species Distribution in China: Integrating Climatic, Topographic, and Anthropogenic Factors.预测气候变化对中国物种分布的影响：整合气候、地形和人为因素

Ecol Evol. 2024 Nov 3;14(11):e70528. doi: 10.1002/ece3.70528. eCollection 2024 Nov.

Diagnosis of Arrhythmia for Patients with Occult Coronary Heart Disease Guided by Intracavitary Electrocardiogram under Data Mining Algorithm.基于数据挖掘算法的腔内心电图引导下隐匿性冠心病患者心律失常的诊断。

J Healthc Eng. 2021 Sep 11;2021:1640870. doi: 10.1155/2021/1640870. eCollection 2021.

Prediction of Neural Diameter From Morphology to Enable Accurate Simulation.从形态学预测神经直径以实现精确模拟。

Front Neuroinform. 2021 Jun 3;15:666695. doi: 10.3389/fninf.2021.666695. eCollection 2021.

HDSI: High dimensional selection with interactions algorithm on feature selection and testing.HDSI：具有交互作用的高维选择算法在特征选择和检验中的应用。

PLoS One. 2021 Feb 16;16(2):e0246159. doi: 10.1371/journal.pone.0246159. eCollection 2021.

An Empirical Model for Describing the Small Field Penumbra in Radiation Therapy.用于描述放射治疗中小野散射线的经验模型。

Biomed Res Int. 2019 Dec 7;2019:7584743. doi: 10.1155/2019/7584743. eCollection 2019.

Artificial Intelligence Estimation of Carotid-Femoral Pulse Wave Velocity using Carotid Waveform.基于颈动脉波形的人工智能估算颈-股脉搏波速度。

Sci Rep. 2018 Jan 17;8(1):1014. doi: 10.1038/s41598-018-19457-0.

本文引用的文献

Head-to-head comparison of left ventricular function assessment with 64-row computed tomography, biplane left cineventriculography, and both 2- and 3-dimensional transthoracic echocardiography: comparison with magnetic resonance imaging as the reference standard.与磁共振成像作为参考标准比较，64 排 CT、双平面左心室电影造影和二维及三维经胸超声心动图对左心室功能的头对头比较。

J Am Coll Cardiol. 2012 May 22;59(21):1897-907. doi: 10.1016/j.jacc.2012.01.046.

Methods in pharmacology: measurement of cardiac output.药理学方法：心输出量的测量。

Br J Clin Pharmacol. 2011 Mar;71(3):316-30. doi: 10.1111/j.1365-2125.2010.03798.x.

Arterial stiffness and cardiovascular events: the Framingham Heart Study.动脉僵硬度与心血管事件：弗雷明汉心脏研究。

Circulation. 2010 Feb 2;121(4):505-11. doi: 10.1161/CIRCULATIONAHA.109.886655. Epub 2010 Jan 18.

A power primer.强力底漆。

Psychol Bull. 1992 Jul;112(1):155-9. doi: 10.1037//0033-2909.112.1.155.

The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination.美国国立心肺血液研究所弗雷明汉心脏研究的第三代队列研究：设计、招募与初始检查。

Am J Epidemiol. 2007 Jun 1;165(11):1328-35. doi: 10.1093/aje/kwm021. Epub 2007 Mar 19.

Changes in arterial stiffness and wave reflection with advancing age in healthy men and women: the Framingham Heart Study.健康男性和女性动脉僵硬度及波反射随年龄增长的变化：弗雷明汉心脏研究

Hypertension. 2004 Jun;43(6):1239-45. doi: 10.1161/01.HYP.0000128420.01881.aa. Epub 2004 May 3.

Epidemiological approaches to heart disease: the Framingham Study.心脏病的流行病学研究方法：弗雷明汉姆研究

Am J Public Health Nations Health. 1951 Mar;41(3):279-81. doi: 10.2105/ajph.41.3.279.

Therapeutic studies and arterial stiffness in hypertension: recommendations of the European Society of Hypertension. The Clinical Committee of Arterial Structure and Function. Working Group on Vascular Structure and Function of the European Society of Hypertension.高血压的治疗研究与动脉僵硬度：欧洲高血压学会建议。动脉结构与功能临床委员会。欧洲高血压学会血管结构与功能工作组。

J Hypertens. 2000 Nov;18(11):1527-35. doi: 10.1097/00004872-200018110-00001.

Homeostasis model assessment: insulin resistance and beta-cell function from fasting plasma glucose and insulin concentrations in man.稳态模型评估：基于人体空腹血糖和胰岛素浓度的胰岛素抵抗及β细胞功能

Diabetologia. 1985 Jul;28(7):412-9. doi: 10.1007/BF00280883.

An investigation of coronary heart disease in families. The Framingham offspring study.家族性冠心病调查。弗雷明汉后代研究。

Am J Epidemiol. 1979 Sep;110(3):281-90. doi: 10.1093/oxfordjournals.aje.a112813.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于连续变量的非线性数据挖掘参数选择算法。

A non-linear data mining parameter selection algorithm for continuous variables.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献