Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France.
Biostatistics and Programming Department, Sanofi R&D, 91380, Chilly Mazarin, France.
BMC Bioinformatics. 2023 Jan 23;24(1):25. doi: 10.1186/s12859-023-05143-0.
In clinical trials, identification of prognostic and predictive biomarkers has became essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso, that integrates prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso and other extensions on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic and proteomic data.
在临床试验中,鉴定预后和预测生物标志物已经成为精准医学的必要条件。预后生物标志物可用于预防疾病的发生,而预测生物标志物可用于识别可能从治疗中获益的患者。以前的研究主要集中在临床特征上,而基因组数据在这一领域的应用几乎没有被研究过。需要一种新的方法来同时选择高维基因组数据中的预后和预测生物标志物,这些生物标志物具有高度相关性。我们提出了一种称为 PPLasso 的新方法,该方法将预后和预测效果集成到一个统计模型中。PPLasso 还考虑了可能改变生物标志物选择准确性的生物标志物之间的相关性。我们的方法包括在应用广义 Lasso 之前将设计矩阵转换以去除生物标志物之间的相关性。在全面的数值评估中,我们表明 PPLasso 在各种情况下的预后和预测生物标志物识别方面均优于传统的 Lasso 和其他扩展。最后,我们的方法应用于公开的转录组和蛋白质组数据。