• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种在多变量数据集中有效选择变量的预测方法。

A forecasting method with efficient selection of variables in multivariate data sets.

作者信息

Sagar Pinki, Gupta Prinima, Kashyap Indu

机构信息

Manav Rachna International Institute of Research and Studies, Faridabad, Haryana India.

出版信息

Int J Inf Technol. 2021;13(3):1039-1046. doi: 10.1007/s41870-021-00619-9. Epub 2021 Feb 28.

DOI:10.1007/s41870-021-00619-9
PMID:33681697
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7914390/
Abstract

Regression is a kind of data analysis technique in which the relationship between the independent variable(x) and dependent variable(y) is modeled and for polynomial regression it is up to the nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted by E (y|x). In this paper polynomial regression analysis has been improved through efficient selection of variables that is coefficient of determination. Coefficient of determination is a square of the correlation between new predicted y values and actual y values and its values are in the range from 0 to 1. The main purpose of regression analysis is to discover the relationship among the independent and dependent variables or in other words it is an explanation of variation in one variable with another variable. In this paper, the main focus is on Multivariate data sets that have many attributes and it is not necessary that all variables are required for data analysis purposes. Using coefficient of determination (COD) irrelevant attributes get eliminated during analysis. The main objective of research is to reduce the cost of data maintenance, reduce the execution time and improve the prediction accuracy rate. COD helps in selecting suitable independent variables. It is a notch that is used in statistical analysis that assesses how well a model explains and forecasts upcoming outcomes. This method also helps in eliminating the irrelevant variables which are not required for the prediction model by this maintenance cost and size of data sets can be reduced.

摘要

回归是一种数据分析技术,其中对自变量(x)和因变量(y)之间的关系进行建模,对于多项式回归,它是最高到n次多项式。多项式回归拟合x值与y的相应条件均值(用E(y|x)表示)之间的非线性关系。在本文中,通过有效选择变量(即决定系数)对多项式回归分析进行了改进。决定系数是新预测的y值与实际y值之间相关性的平方,其值范围为0到1。回归分析的主要目的是发现自变量和因变量之间的关系,或者换句话说,它是用另一个变量解释一个变量的变化。在本文中,主要关注具有许多属性的多变量数据集,并且并非所有变量都必然是数据分析所必需的。使用决定系数(COD)在分析过程中会消除无关属性。研究的主要目标是降低数据维护成本、减少执行时间并提高预测准确率。COD有助于选择合适的自变量。它是一种用于统计分析的指标,评估模型对未来结果的解释和预测能力。这种方法还有助于消除预测模型不需要的无关变量,由此可以降低数据集的维护成本和规模。

相似文献

1
A forecasting method with efficient selection of variables in multivariate data sets.一种在多变量数据集中有效选择变量的预测方法。
Int J Inf Technol. 2021;13(3):1039-1046. doi: 10.1007/s41870-021-00619-9. Epub 2021 Feb 28.
2
Biostatistics Series Module 6: Correlation and Linear Regression.生物统计学系列模块6:相关性与线性回归。
Indian J Dermatol. 2016 Nov-Dec;61(6):593-601. doi: 10.4103/0019-5154.193662.
3
Comparison of artificial neural network and multiple linear regression in the optimization of formulation parameters of leuprolide acetate loaded liposomes.人工神经网络与多元线性回归在醋酸亮丙瑞林脂质体制剂参数优化中的比较
J Pharm Pharm Sci. 2005 Aug 5;8(2):243-58.
4
Mathematical modelling of preparation of acyclovir liposomes: reverse phase evaporation method.阿昔洛韦脂质体制备的数学建模:逆相蒸发法
J Pharm Pharm Sci. 2002 Sep-Dec;5(3):285-91.
5
Discussion on regression analysis with small determination coefficient in human-environment researches.关于人类环境研究中决定系数较小的回归分析的讨论。
Indoor Air. 2022 Oct;32(10):e13117. doi: 10.1111/ina.13117.
6
Multivariate modeling of complications with data driven variable selection: guarding against overfitting and effects of data set size.基于数据驱动变量选择的并发症的多变量建模:防止过拟合和数据集大小的影响。
Radiother Oncol. 2012 Oct;105(1):115-21. doi: 10.1016/j.radonc.2011.12.006. Epub 2012 Jan 20.
7
Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 3. Variable selection in classification.数据集合的典型相关度量(CMC)和典型距离度量(CMD)。第 3 部分。分类中的变量选择。
Anal Chim Acta. 2010 Jan 11;657(2):116-22. doi: 10.1016/j.aca.2009.10.033.
8
Modeling the performance of "up-flow anaerobic sludge blanket" reactor based wastewater treatment plant using linear and nonlinear approaches--a case study.基于线性和非线性方法的“上流式厌氧污泥床”反应器处理废水厂性能建模——案例研究。
Anal Chim Acta. 2010 Jan 18;658(1):1-11. doi: 10.1016/j.aca.2009.11.001. Epub 2009 Nov 10.
9
Multi-step polynomial regression method to model and forecast malaria incidence.用于疟疾发病率建模和预测的多步多项式回归方法。
PLoS One. 2009;4(3):e4726. doi: 10.1371/journal.pone.0004726. Epub 2009 Mar 6.
10
Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables.研究回归模型预测性能的变量选择方法以及所选变量和所选随机变量的比例。
Heliyon. 2021 Jun 18;7(6):e07356. doi: 10.1016/j.heliyon.2021.e07356. eCollection 2021 Jun.

引用本文的文献

1
Comparison of learning models to predict LDPE, PET, and ABS concentrations in beach sediment based on spectral reflectance.基于光谱反射率的海滩沉积物中 LDPE、PET 和 ABS 浓度预测模型比较。
Sci Rep. 2023 Apr 17;13(1):6258. doi: 10.1038/s41598-023-33207-x.

本文引用的文献

1
A novel framework for COVID-19 case prediction through piecewise regression in India.一种通过分段回归对印度新冠病例进行预测的新型框架。
Int J Inf Technol. 2021;13(1):41-48. doi: 10.1007/s41870-020-00552-3. Epub 2020 Nov 10.
2
Data analysis of COVID-2019 epidemic using machine learning methods: a case study of India.使用机器学习方法对2019年冠状病毒病疫情进行数据分析:以印度为例
Int J Inf Technol. 2020;12(4):1321-1330. doi: 10.1007/s41870-020-00484-y. Epub 2020 May 26.