基于二分图的细胞系聚类方法：通过基因表达-药物反应关联进行聚类

Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations.

作者信息

Chi Calvin, Ye Yuting, Chen Bin, Huang Haiyan

机构信息

Center of Computational Biology, College of Engineering, University of California, Berkeley, CA 94720, USA.

Division of Biostatistics, University of California, Berkeley, CA 94720, USA.

出版信息

Bioinformatics. 2021 Sep 9;37(17):2617-2626. doi: 10.1093/bioinformatics/btab143.

DOI:10.1093/bioinformatics/btab143

PMID:33682877

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8428606/

Abstract

MOTIVATION

In pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene-drug association patterns and biological context may not be obvious.

RESULTS

We present a procedure to compare cell lines based on their gene-drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene-drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene-drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene-drug associations. In the pharmacogenomic datasets CTRP2, GDSC2 and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches.

AVAILABILITY AND IMPLEMENTATION

Bipartite graph-based hierarchical clustering is implemented in R and can be obtained from CRAN: https://CRAN.R-project.org/package=hierBipartite. The source code is available at https://github.com/CalvinTChi/hierBipartite. The datasets were derived from sources in the public domain, which are the Cancer Cell Line Encyclopedia (https://portals.broadinstitute.org/ccle), the Cancer Therapeutics Response Portal (https://portals.broadinstitute.org/ctrp.v2.1/?page=#ctd2BodyHome), and the Genomics of Drug Sensitivity in Cancer (https://www.cancerrxgene.org/). These datasets can be downloaded using the PharmacoGx R package (https://bioconductor.org/packages/release/bioc/html/PharmacoGx.html).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在药物基因组学研究中，细胞系的生物学背景会影响药物反应模型的预测能力以及生物标志物的发现。因此，基于生物学注释的先验知识，常将相似的细胞系放在一起研究。然而，这种选择方法随着注释数量的增加而不可扩展，并且基因 - 药物关联模式与生物学背景之间的关系可能并不明显。

结果

我们提出了一种基于基因 - 药物关联模式比较细胞系的方法。从基于生物学注释对细胞系进行分组开始，我们将每组的基因 - 药物关联模式建模为基因与药物之间的二分图。这通过应用稀疏典型相关分析（SCCA）来提取基因 - 药物关联，并使用典型向量构建边权重来实现。然后，我们引入一种基于核范数的差异度量来比较二分图。与我们的方法配套的是一个置换检验，用于评估细胞系组在基因 - 药物关联方面相似性的显著性。在药物基因组学数据集CTRP2、GDSC2和CCLE中，基于这种差异度量对癌组进行层次聚类，独特地揭示了由癌亚型而非原发部位驱动的聚类模式。接下来，我们表明SCCA中 top 相关药物或基因可用于表征造血和淋巴系统恶性肿瘤的聚类模式。最后，我们通过模拟证实，当药物反应与表达呈线性相关时，与现有方法相比，我们的方法是唯一能够有效推断真实层次结构的方法。

可用性和实现

基于二分图的层次聚类在R中实现，可从CRAN获取：https://CRAN.R - project.org/package = hierBipartite。源代码可在https://github.com/CalvinTChi/hierBipartite获取。数据集来自公共领域的来源，即癌细胞系百科全书（https://portals.broadinstitute.org/ccle）、癌症治疗反应门户（https://portals.broadinstitute.org/ctrp.v2.1/?page=#ctd2BodyHome）和癌症药物敏感性基因组学（https://www.cancerrxgene.org/）。这些数据集可使用PharmacoGx R包（https://bioconductor.org/packages/release/bioc/html/PharmacoGx.html）下载。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations.基于二分图的细胞系聚类方法：通过基因表达-药物反应关联进行聚类

Bioinformatics. 2021 Sep 9;37(17):2617-2626. doi: 10.1093/bioinformatics/btab143.

Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions.使用具有邻域交互的并行异构图卷积网络预测癌症药物反应。

Bioinformatics. 2022 Sep 30;38(19):4546-4553. doi: 10.1093/bioinformatics/btac574.

IPCT: Integrated Pharmacogenomic Platform of Human Cancer Cell Lines and Tissues.IPCT：人癌细胞系和组织的综合药物基因组学平台。

Genes (Basel). 2019 Feb 22;10(2):171. doi: 10.3390/genes10020171.

PharmacoGx: an R package for analysis of large pharmacogenomic datasets.PharmacoGx：用于分析大型药物基因组数据集的 R 包。

Bioinformatics. 2016 Apr 15;32(8):1244-6. doi: 10.1093/bioinformatics/btv723. Epub 2015 Dec 9.

decoupleR: ensemble of computational methods to infer biological activities from omics data.decoupleR：用于从组学数据推断生物活性的计算方法集合。

Bioinform Adv. 2022 Mar 8;2(1):vbac016. doi: 10.1093/bioadv/vbac016. eCollection 2022.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA：基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

graphkernels: R and Python packages for graph comparison.图核：用于图比较的 R 和 Python 包。

Bioinformatics. 2018 Feb 1;34(3):530-532. doi: 10.1093/bioinformatics/btx602.

PrInCE: an R/Bioconductor package for protein-protein interaction network inference from co-fractionation mass spectrometry data.PrInCE：一个用于从共分离质谱数据推断蛋白质-蛋白质相互作用网络的R/Bioconductor软件包。

Bioinformatics. 2021 Sep 9;37(17):2775-2777. doi: 10.1093/bioinformatics/btab022.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

cytomapper: an R/Bioconductor package for visualization of highly multiplexed imaging data.细胞映射器：一个用于可视化高度多重成像数据的R/Bioconductor软件包。

Bioinformatics. 2021 Apr 5;36(24):5706-5708. doi: 10.1093/bioinformatics/btaa1061.

引用本文的文献

Integrative Analysis of Immune- and Metabolism-Related Genes Identifies Robust Prognostic Signature and PYCR1 as a Carcinogenic Regulator in Clear Cell Renal Cell Carcinoma.免疫与代谢相关基因的综合分析确定了透明细胞肾细胞癌中强大的预后特征以及作为致癌调节因子的PYCR1 。

Int J Mol Sci. 2025 May 21;26(10):4953. doi: 10.3390/ijms26104953.

Comprehensive pan-cancer analysis reveals NTN1 as an immune infiltrate risk factor and its potential prognostic value in SKCM.全面的泛癌分析揭示NTN1作为一种免疫浸润风险因素及其在皮肤黑色素瘤中的潜在预后价值。

Sci Rep. 2025 Jan 25;15(1):3223. doi: 10.1038/s41598-025-85444-x.

Integrating single-cell RNA-seq and bulk RNA-seq to construct a neutrophil prognostic model for predicting prognosis and immune response in oral squamous cell carcinoma.整合单细胞RNA测序和批量RNA测序以构建用于预测口腔鳞状细胞癌预后和免疫反应的中性粒细胞预后模型。

Hum Genomics. 2024 Dec 26;18(1):140. doi: 10.1186/s40246-024-00712-7.

Glioblastoma vulnerability to neddylation inhibition is dependent on PTEN status, and dysregulation of the cell cycle and DNA replication.胶质母细胞瘤对Neddylation抑制的易感性取决于PTEN状态以及细胞周期和DNA复制的失调。

Neurooncol Adv. 2024 Jun 20;6(1):vdae104. doi: 10.1093/noajnl/vdae104. eCollection 2024 Jan-Dec.

Snowflake: visualizing microbiome abundance tables as multivariate bipartite graphs.雪花：将微生物群落丰度表可视化为多元二分图。

Front Bioinform. 2024 Feb 5;4:1331043. doi: 10.3389/fbinf.2024.1331043. eCollection 2024.

Comprehensive pan-cancer analysis reveals CCDC58 as a carcinogenic factor related to immune infiltration.全面泛癌分析揭示 CCDC58 作为与免疫浸润相关的致癌因子。

Apoptosis. 2024 Apr;29(3-4):536-555. doi: 10.1007/s10495-023-01919-0. Epub 2023 Dec 8.

Multimorbidity prediction using link prediction.利用链接预测进行多病种预测。

Sci Rep. 2021 Aug 12;11(1):16392. doi: 10.1038/s41598-021-95802-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验