State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou 510060, China.
School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou, 510060, China.
Brief Bioinform. 2020 Sep 25;21(5):1818-1824. doi: 10.1093/bib/bbz116.
Unsupervised clustering of high-throughput gene expression data is widely adopted for cancer subtyping. However, cancer subtypes derived from a single dataset are usually not applicable across multiple datasets from different platforms. Merging different datasets is necessary to determine accurate and applicable cancer subtypes but is still embarrassing due to the batch effect. CrossICC is an R package designed for the unsupervised clustering of gene expression data from multiple datasets/platforms without the requirement of batch effect adjustment. CrossICC utilizes an iterative strategy to derive the optimal gene signature and cluster numbers from a consensus similarity matrix generated by consensus clustering. This package also provides abundant functions to visualize the identified subtypes and evaluate subtyping performance. We expected that CrossICC could be used to discover the robust cancer subtypes with significant translational implications in personalized care for cancer patients.
The package is implemented in R and available at GitHub (https://github.com/bioinformatist/CrossICC) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/CrossICC.html) under the GPL v3 License.
非监督聚类的高通量基因表达数据被广泛应用于癌症分型。然而,从单个数据集获得的癌症亚型通常不适用于来自不同平台的多个数据集。合并不同的数据集对于确定准确和适用的癌症亚型是必要的,但由于批次效应仍然令人尴尬。CrossICC 是一个 R 包,用于在没有批次效应调整要求的情况下对来自多个数据集/平台的基因表达数据进行无监督聚类。CrossICC 利用迭代策略从共识聚类生成的共识相似性矩阵中推导出最优基因特征和聚类数量。该软件包还提供了丰富的功能来可视化鉴定的亚型,并评估亚型划分性能。我们期望 CrossICC 能够用于发现具有显著转化意义的稳健癌症亚型,从而为癌症患者的个性化治疗提供帮助。
该软件包是用 R 编写的,并可在 GitHub(https://github.com/bioinformatist/CrossICC)和 Bioconductor(http://bioconductor.org/packages/release/bioc/html/CrossICC.html)上使用,遵循 GPLv3 许可证。