School of Statistics, Renmin University of China, Beijing, China.
School of Public Health, Yale University, New Haven, Connecticut, USA.
Stat Med. 2021 Apr;40(9):2239-2256. doi: 10.1002/sim.8900. Epub 2021 Feb 8.
Partial least squares, as a dimension reduction technique, has become increasingly important for its ability to deal with problems with a large number of variables. Since noisy variables may weaken estimation performance, the sparse partial least squares (SPLS) technique has been proposed to identify important variables and generate more interpretable results. However, the small sample size of a single dataset limits the performance of conventional methods. An effective solution comes from gathering information from multiple comparable studies. Integrative analysis has essential importance in multidatasets analysis. The main idea is to improve performance by assembling raw data from multiple independent datasets and analyzing them jointly. In this article, we develop an integrative SPLS (iSPLS) method using penalization based on the SPLS technique. The proposed approach consists of two penalties. The first penalty conducts variable selection under the context of integrative analysis. The second penalty, a contrasted penalty, is imposed to encourage the similarity of estimates across datasets and generate more sensible and accurate results. Computational algorithms are developed. Simulation experiments are conducted to compare iSPLS with alternative approaches. The practical utility of iSPLS is shown in the analysis of two TCGA gene expression data.
偏最小二乘法作为一种降维技术,因其能够处理大量变量的问题而变得越来越重要。由于噪声变量可能会削弱估计性能,因此提出了稀疏偏最小二乘法(SPLS)技术来识别重要变量并生成更具可解释性的结果。然而,单个数据集的小样本量限制了传统方法的性能。一个有效的解决方案是从多个可比研究中收集信息。综合分析在多数据集分析中具有重要意义。其主要思想是通过从多个独立数据集组装原始数据并联合分析来提高性能。在本文中,我们基于 SPLS 技术开发了一种使用惩罚的集成 SPLS(iSPLS)方法。所提出的方法包括两个惩罚项。第一个惩罚项在综合分析的背景下进行变量选择。第二个惩罚项,对比惩罚项,旨在鼓励数据集之间估计的相似性,并生成更合理和准确的结果。开发了计算算法。进行了模拟实验来比较 iSPLS 与替代方法。在对两个 TCGA 基因表达数据的分析中展示了 iSPLS 的实际效用。