Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
Medical Research Council (MRC) Cancer Unit, University of Cambridge, Cambridge, UK.
BMC Genomics. 2022 Aug 17;23(1):599. doi: 10.1186/s12864-022-08681-8.
Somatic copy number alterations (SCNAs) are an important class of genomic alteration in cancer. They are frequently observed in cancer samples, with studies showing that, on average, SCNAs affect 34% of a cancer cell's genome. Furthermore, SCNAs have been shown to be major drivers of tumour development and have been associated with response to therapy and prognosis. Large-scale cancer genome studies suggest that tumours are driven by somatic copy number alterations (SCNAs) or single-nucleotide variants (SNVs). Despite the frequency of SCNAs and their clinical relevance, the use of genomics assays in the clinic is biased towards targeted gene panels, which identify SNVs but provide limited scope to detect SCNAs throughout the genome. There is a need for a comparably low-cost and simple method for high-resolution SCNA profiling.
We present conliga, a fully probabilistic method that infers SCNA profiles from a low-cost, simple, and clinically-relevant assay (FAST-SeqS). When applied to 11 high-purity oesophageal adenocarcinoma samples, we obtain good agreement (Spearman's rank correlation coefficient, r=0.94) between conliga's inferred SCNA profiles using FAST-SeqS data (approximately £14 per sample) and those inferred by ASCAT using high-coverage WGS (gold-standard). We find that conliga outperforms CNVkit (r=0.89), also applied to FAST-SeqS data, and is comparable to QDNAseq (r=0.96) applied to low-coverage WGS, which is approximately four-fold more expensive, more laborious and less clinically-relevant. By performing an in silico dilution series experiment, we find that conliga is particularly suited to detecting SCNAs in low tumour purity samples. At two million reads per sample, conliga is able to detect SCNAs in all nine samples at 3% tumour purity and as low as 0.5% purity in one sample. Crucially, we show that conliga's hidden state information can be used to decide when a sample is abnormal or normal, whereas CNVkit and QDNAseq cannot provide this critical information.
We show that conliga provides high-resolution SCNA profiles using a convenient, low-cost assay. We believe conliga makes FAST-SeqS a more clinically valuable assay as well as a useful research tool, enabling inexpensive and fast copy number profiling of pre-malignant and cancer samples.
体细胞拷贝数改变(SCNAs)是癌症中重要的一类基因组改变。它们在癌症样本中经常被观察到,研究表明,平均而言,SCNAs 影响 34%的癌细胞基因组。此外,SCNAs 已被证明是肿瘤发展的主要驱动因素,并与治疗反应和预后相关。大规模的癌症基因组研究表明,肿瘤是由体细胞拷贝数改变(SCNAs)或单核苷酸变异(SNVs)驱动的。尽管 SCNAs 的频率很高,且具有临床相关性,但临床中基因组学检测偏向于靶向基因面板,这些面板可以识别 SNVs,但提供的检测整个基因组中 SCNAs 的范围有限。因此,需要一种具有可比性的低成本和简单方法来进行高分辨率的 SCNA 分析。
我们提出了 conliga,这是一种完全概率的方法,可以从一种低成本、简单且与临床相关的检测方法(FAST-SeqS)中推断出 SCNA 图谱。当应用于 11 个高纯度食管腺癌样本时,我们发现 conliga 使用 FAST-SeqS 数据(每个样本约 14 英镑)推断的 SCNA 图谱与 ASCAT 使用高覆盖率 WGS(金标准)推断的 SCNA 图谱之间具有很好的一致性(Spearman 秩相关系数,r=0.94)。我们发现 conliga 优于同样应用于 FAST-SeqS 数据的 CNVkit(r=0.89),并且与应用于低覆盖率 WGS 的 QDNAseq(r=0.96)相当,后者的成本约高四倍,更繁琐,与临床相关性更低。通过进行虚拟稀释系列实验,我们发现 conliga 特别适合于检测低肿瘤纯度样本中的 SCNAs。在每个样本 200 万条reads 的情况下,conliga 能够在 3%肿瘤纯度的 9 个样本中检测到 SCNAs,并且在一个样本中可以检测到低至 0.5%纯度的 SCNAs。至关重要的是,我们表明,conliga 的隐藏状态信息可用于确定样本是否异常或正常,而 CNVkit 和 QDNAseq 则无法提供此关键信息。
我们表明,conliga 使用方便、低成本的检测方法提供了高分辨率的 SCNA 图谱。我们相信,conliga 使 FAST-SeqS 成为更具临床价值的检测方法,也是一种有用的研究工具,能够对癌前和癌症样本进行廉价、快速的拷贝数分析。