Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY, USA.
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
Nat Commun. 2021 May 24;12(1):3051. doi: 10.1038/s41467-021-23094-z.
The vast preponderance of somatic mutations in a typical cancer are either extremely rare or have never been previously recorded in available databases that track somatic mutations. These constitute a hidden genome that contrasts the relatively small number of mutations that occur frequently, the properties of which have been studied in depth. Here we demonstrate that this hidden genome contains much more accurate information than common mutations for the purpose of identifying the site of origin of primary cancers in settings where this is unknown. We accomplish this using a projection-based statistical method that achieves a highly effective signal condensation, by leveraging DNA sequence and epigenetic contexts using a set of meta-features that embody the mutation contexts of rare variants throughout the genome.
在典型癌症中,绝大多数体细胞突变要么极其罕见,要么从未在以前用于跟踪体细胞突变的可用数据库中记录过。这些构成了隐藏的基因组,与经常发生的、性质已被深入研究的相对较少数量的突变形成对比。在这里,我们证明,对于在未知原发癌起源部位的情况下识别原发癌的起源部位,隐藏基因组包含的信息比常见突变更为准确。我们通过使用一种基于投影的统计方法来实现这一点,该方法通过利用 DNA 序列和表观遗传背景,使用一组元特征来实现高度有效的信号凝聚,这些元特征体现了整个基因组中罕见变体的突变背景。