Institute of Molecular Life Science, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland.
SIB, Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057, Zurich, Switzerland.
Sci Rep. 2020 Mar 16;10(1):4846. doi: 10.1038/s41598-020-61854-x.
In many cancers, incidence, treatment efficacy and overall prognosis vary between geographic populations. Studies disentangling the contributing factors may help in both understanding cancer biology and tailoring therapeutic interventions. Ancestry estimation in such studies should preferably be driven by genomic data, due to frequently missing or erroneous self-reported or inferred metadata. While respective algorithms have been demonstrated for baseline genomes, such a strategy has not been shown for cancer genomes carrying a substantial somatic mutation load. We have developed a bioinformatics tool for the assignment of population groups from genome profiling data for both unaltered and cancer genomes. Despite extensive somatic mutations in the cancer genomes, consistency between germline and cancer data reached of 97% and 92% for assignment into 5 and 26 ancestral groups, respectively. Comparison with self-reported meta-data estimated a matching rate between 88-92%, mostly limited by interpretation of self-reported ethnicity labels compared to the standardized mapping output. Our SNP2pop application allows to assess population information from SNP arrays as well as sequencing platforms and to estimate the population structure in cancer genomics projects, to facilitate research into the interplay between ethnicity-related genetic background, environmental factors and somatic mutation patterns in cancer biology.
在许多癌症中,地理人群之间的发病率、治疗效果和总体预后存在差异。解析这些影响因素的研究有助于理解癌症生物学,并针对治疗干预措施进行个性化调整。在这些研究中,最好通过基因组数据来估计祖先,因为自我报告或推断的元数据经常缺失或错误。虽然已经针对基线基因组证明了相应的算法,但对于携带大量体细胞突变的癌症基因组,这种策略尚未得到证明。我们开发了一种用于从基因组分析数据中为未改变和癌症基因组分配人群组的生物信息学工具。尽管癌症基因组中存在广泛的体细胞突变,但种系和癌症数据之间的一致性分别达到 97%和 92%,可分别分配到 5 个和 26 个祖先组中。与自我报告的元数据进行比较,匹配率在 88-92%之间,主要受到自我报告的种族标签与标准化映射输出之间的解释差异的限制。我们的 SNP2pop 应用程序允许评估 SNP 阵列和测序平台的人群信息,并估计癌症基因组学项目中的人群结构,以促进研究种族相关遗传背景、环境因素和癌症生物学中体细胞突变模式之间的相互作用。