Hagemann Niklas, Möllenhoff Kathrin
Institute of Medical Statistics and Computational Biology (IMSB), Faculty of Medicine, University of Cologne, Cologne, Germany.
Stat Med. 2025 Mar 15;44(6):e10309. doi: 10.1002/sim.10309.
A common problem in numerous research areas, particularly in clinical trials, is to test whether the effect of an explanatory variable on an outcome variable is equivalent across different groups. In practice, these tests are frequently used to compare the effect between patient groups, for example, based on gender, age, or treatments. Equivalence is usually assessed by testing whether the difference between the groups does not exceed a pre-specified equivalence threshold. Classical approaches are based on testing the equivalence of single quantities, for example, the mean, the area under the curve or other values of interest. However, when differences depending on a particular covariate are observed, these approaches can turn out to be not very accurate. Instead, whole regression curves over the entire covariate range, describing for instance the time window or a dose range, are considered and tests are based on a suitable distance measure of two such curves, as, for example, the maximum absolute distance between them. In this regard, a key assumption is that the true underlying regression models are known, which is rarely the case in practice. However, misspecification can lead to severe problems as inflated type I errors or, on the other hand, conservative test procedures. In this paper, we propose a solution to this problem by introducing a flexible extension of such an equivalence test using model averaging in order to overcome this assumption and making the test applicable under model uncertainty. Precisely, we introduce model averaging based on smooth Bayesian information criterion weights and we propose a testing procedure which makes use of the duality between confidence intervals and hypothesis testing. We demonstrate the validity of our approach by means of a simulation study and illustrate its practical relevance considering a time-response case study with toxicological gene expression data.
在众多研究领域,尤其是临床试验中,一个常见的问题是检验解释变量对结果变量的影响在不同组间是否等同。在实际操作中,这些检验常被用于比较患者组之间的效应,例如基于性别、年龄或治疗方法。等同性通常通过检验组间差异是否不超过预先设定的等同性阈值来评估。经典方法基于检验单个量的等同性,例如均值、曲线下面积或其他感兴趣的值。然而,当观察到依赖于特定协变量的差异时,这些方法可能会变得不太准确。相反,会考虑在整个协变量范围内的完整回归曲线,例如描述时间窗或剂量范围,并且检验基于两条这样的曲线的合适距离度量,例如它们之间的最大绝对距离。在这方面,一个关键假设是真实的潜在回归模型是已知的,而这在实际中很少成立。然而,模型设定错误可能会导致严重问题,如第一类错误膨胀,或者另一方面导致保守的检验程序。在本文中,我们提出了一个解决这个问题的方案,即通过引入一种使用模型平均的等同性检验的灵活扩展,以克服这个假设并使检验在模型不确定性下适用。具体而言,我们引入基于平滑贝叶斯信息准则权重的模型平均,并提出一种利用置信区间和假设检验之间对偶性的检验程序。我们通过模拟研究证明了我们方法的有效性,并通过考虑一个有毒理学基因表达数据的时间响应案例研究来说明其实际相关性。