Chen Jinyu, Min Wenwen
School of Mathematics, Statistics and Mechanics, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing 100124, China.
School of Information Science and Engineering, Yunnan University, East Outer Ring Road, Chenggong District, Kunming 650500, China.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf195.
The rapidly emerging large-scale data in diverse biological research fields present valuable opportunities to explore the underlying mechanisms of tissue development and disease progression. However, few existing methods can simultaneously capture common and condition-specific association between different types of features across different biological conditions, such as cancer types or cell populations. Therefore, we developed the sparse tensor-based partial least squares (sTPLS) method, which integrates multiple pairs of datasets containing two types of features but derived from different biological conditions. We demonstrated the effectiveness and versatility of sTPLS through simulation study and three biological applications. By integrating the pairwise pharmacogenomic data, sTPLS identified 11 gene-drug comodules with high biological functional relevance specific for seven cancer types and two comodules that shared across multi-type cancers, such as breast, ovarian, and colorectal cancers. When applied to single-cell data, it uncovered nine gene-peak comodules representing transcriptional regulatory relationships specific for five cell types and three comodules shared across similar cell types, such as intermediate and naïve B cells. Furthermore, sTPLS can be directly applied to tensor-structured data, successfully revealing shared and distinct cell communication patterns mediated by the MK signaling pathway in coronavirus disease 2019 patients and healthy controls. These results highlight the effectiveness of sTPLS in identifying biologically meaningful relationships across diverse conditions, making it useful for multi-omics integrative analysis.
不同生物研究领域中迅速涌现的大规模数据为探索组织发育和疾病进展的潜在机制提供了宝贵机遇。然而,现有的方法很少能同时捕捉不同生物条件(如癌症类型或细胞群体)下不同类型特征之间的共同关联和特定条件关联。因此,我们开发了基于稀疏张量的偏最小二乘法(sTPLS),该方法整合了多对包含两种类型特征但源自不同生物条件的数据集。我们通过模拟研究和三个生物学应用证明了sTPLS的有效性和通用性。通过整合成对的药物基因组数据,sTPLS识别出了11个具有高度生物学功能相关性的基因-药物共模块,这些共模块特定于七种癌症类型,以及两个跨多种癌症(如乳腺癌、卵巢癌和结直肠癌)共享的共模块。当应用于单细胞数据时,它发现了九个代表五种细胞类型特异性转录调控关系的基因-峰值共模块,以及三个跨相似细胞类型(如中间B细胞和初始B细胞)共享的共模块。此外,sTPLS可以直接应用于张量结构数据,成功揭示了2019冠状病毒病患者和健康对照中由MK信号通路介导的共享和独特的细胞通讯模式。这些结果突出了sTPLS在识别不同条件下生物学上有意义的关系方面的有效性,使其对多组学综合分析有用。