Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA.
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Nat Biotechnol. 2021 Sep;39(9):1151-1160. doi: 10.1038/s41587-021-00993-6. Epub 2021 Sep 9.
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
由于缺乏用于生成标准化 DNA 数据集的样本,从而难以建立测序流程或对不同算法的性能进行基准测试,这限制了癌症基因组学的实施和应用。在这里,我们描述了从乳腺癌细胞系(具有非整倍体基因组且体细胞突变高度富集)及其匹配的淋巴母细胞系衍生的配对肿瘤-正常基因组 DNA(gDNA)样本中获得的参考调用集。我们通过不同测序平台的全外显子组测序(WES)和 >2000 倍覆盖度的靶向测序对这些调用集中的体细胞突变和种系变异进行了部分验证,涵盖了具有高置信度的 82%基因组区域。尽管 gDNA 参考样本不能代表来自临床样本的原发性癌细胞,但在建立测序流程时,它们不仅可以最大限度地减少技术、检测和信息学带来的潜在偏差,还为“仅肿瘤”或“匹配的肿瘤-正常”分析提供了独特的基准资源。