Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, 37232, U.S.A.
Department of General Surgery, Tangdu Hospital, Fourth Military Medical University, Xi'an, 710032, China.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac200.
Cell-free DNA (cfDNA) provides a convenient diagnosis avenue for noninvasive cancer detection. The current methods are focused on identifying circulating tumor DNA (ctDNA)s genomic aberrations, e.g. mutations, copy number aberrations (CNAs) or methylation changes. In this study, we report a new computational method that unifies two orthogonal pieces of information, namely methylation and CNAs, derived from whole-genome bisulfite sequencing (WGBS) data to quantify low tumor content in cfDNA. It implements a Bayes model to enrich ctDNA from WGBS data based on hypomethylation haplotypes, and subsequently, models CNAs for cancer detection. We generated WGBS data in a total of 262 samples, including high-depth (>20×, deduped high mapping quality reads) data in 76 samples with matched triplets (tumor, adjacent normal and cfDNA) and low-depth (~2.5×, deduped high mapping quality reads) data in 186 samples. We identified a total of 54 Mb regions of hypomethylation haplotypes for model building, a vast majority of which are not covered in the HumanMethylation450 arrays. We showed that our model is able to substantially enrich ctDNA reads (tens of folds), with clearly elevated CNAs that faithfully match the CNAs in the paired tumor samples. In the 19 hepatocellular carcinoma cfDNA samples, the estimated enrichment is as high as 16 fold, and in the simulation data, it can achieve over 30-fold enrichment for a ctDNA level of 0.5% with a sequencing depth of 600×. We also found that these hypomethylation regions are also shared among many cancer types, thus demonstrating the potential of our framework for pancancer early detection.
无细胞游离 DNA(cfDNA)为非侵入性癌症检测提供了便捷的诊断途径。目前的方法侧重于识别循环肿瘤 DNA(ctDNA)的基因组异常,例如突变、拷贝数异常(CNAs)或甲基化变化。在这项研究中,我们报告了一种新的计算方法,该方法统一了两个正交信息,即源自全基因组亚硫酸氢盐测序(WGBS)数据的甲基化和 CNAs,以量化 cfDNA 中的低肿瘤含量。它实现了一种贝叶斯模型,根据低甲基化单倍型从 WGBS 数据中富集 ctDNA,然后为癌症检测建模 CNAs。我们总共生成了 262 个样本的 WGBS 数据,包括 76 个匹配三对(肿瘤、相邻正常和 cfDNA)的高深度(>20×,去重高映射质量读数)数据和 186 个低深度(~2.5×,去重高映射质量读数)数据。我们总共鉴定了 54 Mb 的低甲基化单倍型区域用于模型构建,其中绝大多数区域未涵盖在 HumanMethylation450 阵列中。我们表明,我们的模型能够显著富集 ctDNA 读数(数十倍),并且明显升高的 CNAs 与配对肿瘤样本中的 CNAs 非常吻合。在 19 个肝细胞癌 cfDNA 样本中,估计的富集高达 16 倍,而在模拟数据中,对于 ctDNA 水平为 0.5%、测序深度为 600×的情况,它可以实现超过 30 倍的富集。我们还发现这些低甲基化区域在许多癌症类型中也存在共享,从而证明了我们的框架用于泛癌早期检测的潜力。