Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
Bioinformatics. 2010 Feb 15;26(4):518-28. doi: 10.1093/bioinformatics/btp694. Epub 2009 Dec 23.
Somatic amplification of particular genomic regions and selection of cellular lineages with such amplifications drives tumor development. However, pinpointing genes under such selection has been difficult due to the large span of these regions. Our recently-developed method, the amplification distortion test (ADT), identifies specific nucleotide alleles and haplotypes that confer better survival for tumor cells when somatically amplified. In this work, we focus on evaluating ADT's power to detect such causal variants across a variety of tumor dataset scenarios.
Towards this end, we generated multiple parameter-based, synthetic datasets-derived from real data-that contain somatic copy number aberrations (CNAs) of various lengths and frequencies over germline single nucleotide polymorphisms (SNPs) genome-wide. Gold-standard causal sub-regions were assigned within these CNAs, followed by an assessment of ADT's ability to detect these sub-regions. Results indicate that ADT possesses high sensitivity and specificity in large sample sizes across most parameter cases, including those that more closely reflect existing SNP and CNA cancer data.
特定基因组区域的体细胞扩增和具有这种扩增的细胞谱系的选择驱动肿瘤的发展。然而,由于这些区域跨度很大,因此很难确定这些选择下的基因。我们最近开发的方法,扩增扭曲测试(ADT),可以识别在体细胞扩增时为肿瘤细胞提供更好生存能力的特定核苷酸等位基因和单倍型。在这项工作中,我们专注于评估 ADT 在各种肿瘤数据集场景中检测此类因果变体的能力。
为此,我们生成了多个基于参数的合成数据集,这些数据集是从包含全基因组范围内各种长度和频率的体细胞拷贝数异常(CNA)的真实数据中衍生而来的。在这些 CNA 中分配了金标准因果亚区,然后评估 ADT 检测这些亚区的能力。结果表明,ADT 在大多数参数情况下的大样本量中具有高灵敏度和特异性,包括更接近现有 SNP 和 CNA 癌症数据的参数情况。