Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA.
Department of Medical Life Sciences, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
Biometrics. 2022 Jun;78(2):612-623. doi: 10.1111/biom.13458. Epub 2021 Mar 30.
Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two-step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.
同时利用来自多个来源(多视图数据)的数据优势的分类方法具有产生比两步法(关联后分类)更强大的发现的巨大潜力。我们提出了两种方法,稀疏综合判别分析(SIDA)和纳入网络信息的 SIDA(SIDANet),用于联合关联和分类研究。这些方法考虑了多视图数据之间的整体关联,以及在选择与关联并将主体最佳分离到不同类别中的判别向量时,每个视图内的分离。SIDANet 是首批在联合关联和分类研究中纳入先验结构信息的方法之一。它使用图的归一化拉普拉斯算子对预测变量的系数进行平滑,从而鼓励选择连接的预测变量。我们在一组合成数据集上展示了我们方法的有效性,并探索了它们在识别潜在的非传统风险因素方面的用途,这些因素可区分 10 年内发生动脉粥样硬化心血管疾病低风险和高风险的健康患者。如果目标是关联多视图数据并进行分类,我们的发现强调了联合关联和分类方法的益处。