CaMutQC:一个用于癌症体细胞突变综合质量控制和过滤的R软件包。

CaMutQC: An R package for integrative quality control and filtration of cancer somatic mutations.

作者信息

Wang Xin, Jiang Tengjia, Shen Ao, Chen Yaru, Zhou Yanqing, Liu Jie, Zhao Shuhan, Chen Shifu, Ren Jian, Zhao Qi

机构信息

State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China.

Department of Bioinformatics, HaploX Biotechnology, Shenzhen 518057, China.

出版信息

Comput Struct Biotechnol J. 2025 Jul 16;27:3147-3154. doi: 10.1016/j.csbj.2025.07.011. eCollection 2025.

Abstract

The quality control and filtration of cancer somatic mutations (CAMs), including the elimination of false positives due to technical bias and the selection of key mutation candidates, are crucial steps for downstream analysis in cancer genomics. However, due to diverse needs and the lack of standardized filtering criteria, the filtering strategies applied vary from study to study, often resulting in reduced efficiency, accuracy, and reproducibility. Here, we present CaMutQC, a heuristic quality control and soft-filtering R/Bioconductor package designed specifically for CAMs. CaMutQC enables users to remove false positive mutations, select potential mutation candidates, and estimate Tumor Mutation Burden (TMB) with a single line of code, using either default or customized parameters. A filter report and a code log can also be generated after the filtration process to facilitate reproducibility and comparison. The application of CaMutQC to a Whole-exome Sequencing (WES) benchmark dataset demonstrated its strong capability by eliminating 85.55 % of false positive Single nucleotide variants (SNVs) while retaining 90.72 % of true positive SNVs. Additionally, an additional 11.56 % of true positive SNVs were rescued through CaMutQC's built-in union strategy. Similar results were observed for Insertions and Deletions (INDELs). CaMutQC is freely available through Bioconductor at https://bioconductor.org/packages/CaMutQC/ under the GPL v3 license.

摘要

癌症体细胞突变(CAMs)的质量控制和过滤,包括消除由于技术偏差导致的假阳性以及选择关键突变候选者,是癌症基因组学下游分析的关键步骤。然而,由于需求多样且缺乏标准化的过滤标准,不同研究采用的过滤策略各不相同,常常导致效率、准确性和可重复性降低。在此,我们展示了CaMutQC,这是一个专门为CAMs设计的启发式质量控制和软过滤R/Bioconductor软件包。CaMutQC使用户能够通过一行代码,使用默认或定制参数,去除假阳性突变,选择潜在的突变候选者,并估计肿瘤突变负担(TMB)。过滤过程完成后还可以生成过滤报告和代码日志,以促进可重复性和比较。将CaMutQC应用于全外显子测序(WES)基准数据集,通过消除85.55%的假阳性单核苷酸变异(SNV),同时保留90.72%的真阳性SNV,证明了其强大的能力。此外,通过CaMutQC的内置联合策略,还挽救了另外11.56%的真阳性SNV。插入和缺失(INDELs)也观察到了类似的结果。CaMutQC可通过Bioconductor在https://bioconductor.org/packages/CaMutQC/上免费获取,遵循GPL v3许可协议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b500/12302822/37ca805dec10/ga1.jpg

相似文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索