Aminu Muhammad, Hong Lingzhi, Vokes Natalie, Schmidt Stephanie T, Saad Maliazurina, Zhu Bo, Le Xiuning, Tina Cascone, Sheshadri Ajay, Wang Bo, Jaffray David, Futreal Andy, Lee J Jack, Byers Lauren A, Gibbons Don, Heymach John, Chen Ken, Cheng Chao, Zhang Jianjun, Wu Jia
Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Res Sq. 2024 May 17:rs.3.rs-4353037. doi: 10.21203/rs.3.rs-4353037/v1.
Integrative multi-omics analysis provides deeper insight and enables better and more realistic modeling of the underlying biology and causes of diseases than does single omics analysis. Although several integrative multi-omics analysis methods have been proposed and demonstrated promising results in integrating distinct omics datasets, inconsistent distribution of the different omics data, which is caused by technology variations, poses a challenge for paired integrative multi-omics methods. In addition, the existing discriminant analysis-based integrative methods do not effectively exploit correlation and consistent discriminant structures, necessitating a compromise between correlation and discrimination in using these methods. Herein we present PAN-omics Discriminant Analysis (PANDA), a joint discriminant analysis method that seeks omics-specific discriminant common spaces by jointly learning consistent discriminant latent representations for each omics. PANDA jointly maximizes between-class and minimizes within-class omics variations in a common space and simultaneously models the relationships among omics at the consistency representation and cross-omics correlation levels, overcoming the need for compromise between discrimination and correlation as with the existing integrative multi-omics methods. Because of the consistency representation learning incorporated into the objective function of PANDA, this method seeks a common discriminant space to minimize the differences in distributions among omics, can lead to a more robust latent representations than other methods, and is against the inconsistency of the different omics. We compared PANDA to 10 other state-of-the-art multi-omics data integration methods using both simulated and real-world multi-omics datasets and found that PANDA consistently outperformed them while providing meaningful discriminant latent representations. PANDA is implemented using both R and MATLAB, with codes available at https://github.com/WuLabMDA/PANDA.
与单一组学分析相比,整合多组学分析能提供更深入的见解,使对潜在生物学机制和疾病病因的建模更完善、更贴近实际。尽管已经提出了几种整合多组学分析方法,并在整合不同组学数据集方面取得了有前景的结果,但由技术差异导致的不同组学数据分布不一致,给配对整合多组学方法带来了挑战。此外,现有的基于判别分析的整合方法没有有效利用相关性和一致的判别结构,在使用这些方法时需要在相关性和判别性之间进行权衡。在此,我们提出了多组学判别分析(PANDA),这是一种联合判别分析方法,通过为每个组学共同学习一致的判别性潜在表示来寻找组学特异性的判别共同空间。PANDA在一个共同空间中联合最大化组间差异并最小化组内组学差异,同时在一致性表示和跨组学相关性水平上对组学之间的关系进行建模,克服了现有整合多组学方法在判别性和相关性之间进行权衡的需要。由于PANDA的目标函数中纳入了一致性表示学习,该方法寻求一个共同的判别空间以最小化组学间分布的差异,能产生比其他方法更稳健的潜在表示,并且能应对不同组学的不一致性。我们使用模拟和真实世界的多组学数据集将PANDA与其他10种先进的多组学数据整合方法进行了比较,发现PANDA始终优于它们,同时还能提供有意义的判别性潜在表示。PANDA使用R和MATLAB实现,代码可在https://github.com/WuLabMDA/PANDA获取。