Department of Mathematical Sciences, University of Essex, Colchester, UK.
Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
BMC Med Res Methodol. 2019 Mar 6;19(1):46. doi: 10.1186/s12874-019-0666-3.
With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R.
In this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions.
We present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results.
This work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance.
随着理论和计算方面的进展,样条建模已成为统计回归分析中一种既定的工具。样条建模中的一个重要问题是是否有用户友好、文档齐全的软件包。为了响应 STRengthening Analytical Thinking for Observational Studies 倡议,为用户提供有关在观察性研究中应用统计方法的指南文件的想法,本文旨在提供最广泛使用的基于样条的技术及其在 R 中的实现的概述。
在这项工作中,我们专注于 R 语言统计计算,它已成为一种非常流行的统计软件。我们确定了一组包含在回归框架内进行样条建模的功能的软件包。使用模拟和真实数据,我们介绍了样条建模,并概述了最流行的样条函数。
我们展示了一系列简单的单变量数据场景,其中使用不同的基函数来确定独立变量的正确函数形式。即使在简单的数据中,使用来自不同软件包的例程也会导致不同的结果。
这项工作说明了分析师在处理数据时面临的挑战。大多数差异可以归因于超参数的选择,而不是所使用的基函数。实际上,有经验的用户将知道如何获得合理的结果,而不管使用哪种样条。但是,许多分析师没有足够的知识来充分利用这些强大的工具,因此需要更多的指导。