Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Commun Biol. 2020 Dec 8;3(1):744. doi: 10.1038/s42003-020-01460-9.
Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample.
现有的人类测序数据癌症基准数据集使用种系变异、合成方法或昂贵的验证方法,这些方法都不能令人满意地提供整个基因组中大量真实体细胞变异的集合。在这里,我们提出了一个数据集,即 HT115 结肠癌细胞系中的短体细胞突变的谱系衍生体细胞真实(LinST),该数据集使用已知的细胞谱系进行验证,该谱系包含数千个突变和一个覆盖每个样本 2.7 吉字节的高置信度区域。