Wang Jinjuan, Zhao Yunpeng, Tang Larry L, Mueller Claudius, Li Qizhai
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, People's Republic of China.
School of Mathematical and Natural Sciences, Arizona State University, Tempe, AZ, USA.
J Appl Stat. 2021 Sep 22;49(16):4278-4293. doi: 10.1080/02664763.2021.1977785. eCollection 2022.
In disease screening, a biomarker combination developed by combining multiple markers tends to have a higher sensitivity than an individual marker. Parametric methods for marker combination rely on the inverse of covariance matrices, which is often a non-trivial problem for high-dimensional data generated by modern high-throughput technologies. Additionally, another common problem in disease diagnosis is the existence of limit of detection (LOD) for an instrument - that is, when a biomarker's value falls below the limit, it cannot be observed and is assigned an NA value. To handle these two challenges in combining high-dimensional biomarkers with the presence of LOD, we propose a resample-replace lasso procedure. We first impute the values below LOD and then use the graphical lasso method to estimate the means and precision matrices for the high-dimensional biomarkers. The simulation results show that our method outperforms alternative methods such as either substitute NA values with LOD values or remove observations that have NA values. A real case analysis on a protein profiling study of glioblastoma patients on their survival status indicates that the biomarker combination obtained through the proposed method is more accurate in distinguishing between two groups.
在疾病筛查中,通过组合多个标志物开发的生物标志物组合往往比单个标志物具有更高的灵敏度。用于标志物组合的参数方法依赖于协方差矩阵的逆,这对于现代高通量技术生成的高维数据来说通常是一个棘手的问题。此外,疾病诊断中的另一个常见问题是仪器存在检测限(LOD)——也就是说,当生物标志物的值低于该限时,就无法观察到,并且会被赋值为NA值。为了应对在存在LOD的情况下组合高维生物标志物时的这两个挑战,我们提出了一种重采样替换套索程序。我们首先对低于LOD的值进行插补,然后使用图形套索方法来估计高维生物标志物的均值和精度矩阵。模拟结果表明,我们的方法优于其他方法,如用LOD值替代NA值或删除具有NA值的观测值。对胶质母细胞瘤患者生存状态的蛋白质谱研究进行的实际案例分析表明,通过所提出的方法获得的生物标志物组合在区分两组时更准确。