Division of Biostatistics, Ohio State University, Columbus, OH 43210, USA.
Whole-Genome Research Core Laboratory of Human Diseases, Chang Gung Memorial Hospital, Keelung 204, Taiwan.
Bioinformatics. 2021 Sep 29;37(18):3026-3028. doi: 10.1093/bioinformatics/btab183.
In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package 'HCMMCNVs' is also developed for processing user-provided bam files, running CNVs detection algorithm and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts.
HCMMCNVs R shiny software is freely available at github repository https://github.com/lunching/HCMM_CNVs.and Zenodo https://doi.org/10.5281/zenodo.4593371.
Supplementary data are available at Bioinformatics online.
本文提出了一种基于层次聚类和期望最大化(EM)算法的高斯混合模型,用于使用全外显子组测序(WES)数据检测拷贝数变异(CNVs)。还开发了 R shiny 包“HCMMCNVs”,用于处理用户提供的 bam 文件、运行 CNVs 检测算法和进行可视化。通过将我们的方法应用于癌症细胞系百科全书(CCLE)中 22 种肿瘤类型的 325 种癌细胞系,我们表明我们的算法与其他现有方法具有竞争力,并且可以使用多个癌细胞系进行 CNVs 估计。此外,通过将我们的方法应用于 120 个口腔鳞状细胞癌(OSCC)样本的 WES 数据,与使用肿瘤和配对正常对照的方法相比,我们的算法仅使用肿瘤样本在检测 CNVs 方面具有更高的能力。
HCMMCNVs R shiny 软件可在 github 存储库 https://github.com/lunching/HCMM_CNVs. 和 Zenodo https://doi.org/10.5281/zenodo.4593371. 免费获得。
补充数据可在生物信息学在线获得。