Sequenza:来自肿瘤测序数据的等位基因特异性拷贝数和突变图谱。

Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.

作者信息

Favero F, Joshi T, Marquard A M, Birkbak N J, Krzystanek M, Li Q, Szallasi Z, Eklund A C

机构信息

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark; Medical School, Xiamen University, Xiamen, China.

出版信息

Ann Oncol. 2015 Jan;26(1):64-70. doi: 10.1093/annonc/mdu479. Epub 2014 Oct 15.

Abstract

BACKGROUND

Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described.

MATERIALS AND METHODS

We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm.

RESULTS

Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%.

CONCLUSIONS

The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations.

摘要

背景

肿瘤DNA与配对的正常DNA进行外显子组或全基因组深度测序,有可能提供肿瘤特征性体细胞突变的详细情况。然而,肿瘤标本中正常细胞的存在、肿瘤内异质性以及原始数据的庞大规模,可能会使此类序列数据分析变得复杂。特别是,仅从外显子组测序数据确定拷贝数变异已被证明具有难度;因此,单核苷酸多态性(SNP)阵列常被用于此项任务。最近,已经描述了从肿瘤测序数据估计绝对(而非等位基因特异性)拷贝数谱的算法。

材料与方法

我们开发了Sequenza软件包,该软件包利用配对的肿瘤-正常DNA测序数据来估计肿瘤细胞含量和倍性,并计算等位基因特异性拷贝数谱和突变谱。我们将Sequenza以及两种先前发表的算法应用于来自癌症基因组图谱的30个肿瘤的外显子组序列数据。我们通过将这些算法的结果与使用匹配的SNP阵列生成并由肿瘤等位基因特异性拷贝数分析(ASCAT)算法处理的结果进行比较,来评估这些算法的性能。

结果

Sequenza/外显子组与SNP/ASCAT之间的比较显示,在细胞含量估计(Pearson相关系数r = 0.90)和倍性估计方面(r = 0.42,或在人工检查替代解决方案后r = 0.94)具有很强的相关性。这种性能明显优于先前发表的算法。此外,在模拟正常-肿瘤混合物的人工数据中,Sequenza在肿瘤含量低至30%的样本中检测到了正确的倍性。

结论

Sequenza与基于SNP阵列的拷贝数谱之间的一致性表明,仅外显子组测序不仅足以识别小规模突变,还足以估计细胞含量并推断DNA拷贝数畸变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1013/4269342/7df7e6c63502/mdu47901.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索