Department of Statistical Sciences, 9308University of Padova, Italy.
Department of Statistical Sciences "Paolo Fortunati", 9296University of Bologna, Italy.
Stat Methods Med Res. 2022 Jul;31(7):1325-1341. doi: 10.1177/09622802221089029. Epub 2022 Mar 31.
Statistical evaluation of diagnostic tests, and, more generally, of biomarkers, is a constantly developing field, in which complexity of the assessment increases with the complexity of the design under which data are collected. One particularly prevalent type of data is clustered data, where individual units are naturally nested into clusters. In these cases, Bias can arise from omission, in the evaluation process, of cluster-level effects and/or individual covariates. Focusing on the three-class case and for continuous-valued diagnostic tests, we investigate how to exploit the clustered structure of data within a linear-mixed model approach, both when the assumption of normality holds and when it does not. We provide a method for the estimation of covariate-specific receiver operating characteristic surfaces and discuss methods for the choice of optimal thresholds, proposing three possible estimators. A proof of consistency and asymptotic normality of the proposed threshold estimators is given. All considered methods are evaluated by extensive simulation experiments. As an application, we study the use of the gene expression as a biomarker to distinguish among three types of glutamatergic neurons.
统计评估诊断测试,更一般地说,生物标志物,是一个不断发展的领域,其中评估的复杂性随着所收集数据的设计的复杂性而增加。一种特别流行的数据类型是聚类数据,其中个体单元自然嵌套到聚类中。在这些情况下,由于在评估过程中忽略了聚类水平的效应和/或个体协变量,可能会出现偏差。我们专注于三分类情况,并针对连续值诊断测试,研究了如何在线性混合模型方法中利用数据的聚类结构,包括在正态性假设成立和不成立的情况下。我们提供了一种用于估计协变量特定接收者操作特征曲面的方法,并讨论了选择最佳阈值的方法,提出了三种可能的估计器。给出了所提出的阈值估计器的一致性和渐近正态性的证明。通过广泛的模拟实验评估了所有考虑的方法。作为一个应用,我们研究了使用 基因表达作为生物标志物来区分三种类型的谷氨酸能神经元。