Department of Computer Science, Xiamen University, Xiamen, Fujian, China.
Department of Chemistry and Biochemistry, Ohio University, Athens, OH, United States of America.
PLoS Comput Biol. 2019 Feb 19;15(2):e1006772. doi: 10.1371/journal.pcbi.1006772. eCollection 2019 Feb.
Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks. In this study, we developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients' scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients' scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles. COAC is freely available at https://github.com/ChengF-Lab/COAC.
近年来,下一代测序和计算技术的进步使得对大规模单细胞核糖核酸测序 (scRNA-seq) 数据的常规分析成为可能。然而,scRNA-seq 技术存在几个技术挑战,包括大多数基因的平均表达水平较低,以及与批量群体测序技术相比,缺失数据的频率更高。从 scRNA-seq 图谱中识别将特定细胞类型与人类疾病和治疗联系起来的功能基因集及其调控网络是一项艰巨的任务。在这项研究中,我们开发了一种组件重叠属性聚类 (COAC) 算法,用于从大规模 scRNA-seq 图谱中执行局部(细胞亚群)基因共表达网络分析。从 scRNA-seq 图谱分解矩阵的组件中推断出代表特定基因共表达模式的基因子网络。我们表明,通过 COAC 从细胞相内的多个时间点识别的单细胞基因子网络可用于细胞类型识别,准确率高达 83%。此外,COAC 从黑色素瘤患者 scRNA-seq 图谱推断的子网络与来自癌症基因组图谱 (TCGA) 的存活率高度相关。此外,通过 COAC 从个体患者的 scRNA-seq 数据识别的局部基因子网络可用作预测癌症细胞系中药物反应的药物基因组学生物标志物(来自药物敏感性基因组学 (GDSC) 数据库的接收器工作特征曲线下面积范围为 0.728 至 0.783)。总之,COAC 为从大规模 scRNA-seq 图谱中识别潜在的基于网络的诊断和药物基因组学生物标志物提供了一种强大的工具。COAC 可在 https://github.com/ChengF-Lab/COAC 上免费获得。