Suppr超能文献

来自Affymetrix SNP 6.0基因分型数据的拷贝数变异——常用预测方法的准确性如何?

Copy number aberrations from Affymetrix SNP 6.0 genotyping data-how accurate are commonly used prediction approaches?

作者信息

Pitea Adriana, Kondofersky Ivan, Sass Steffen, Theis Fabian J, Mueller Nikola S, Unger Kristian

机构信息

Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.

Research Unit Radiation Cytogenetics, Helmholtz Zentrum München, Neuherberg, Germany.

出版信息

Brief Bioinform. 2020 Jan 17;21(1):272-281. doi: 10.1093/bib/bby096.

Abstract

Copy number aberrations (CNAs) are known to strongly affect oncogenes and tumour suppressor genes. Given the critical role CNAs play in cancer research, it is essential to accurately identify CNAs from tumour genomes. One particular challenge in finding CNAs is the effect of confounding variables. To address this issue, we assessed how commonly used CNA identification algorithms perform on SNP 6.0 genotyping data in the presence of confounding variables. We simulated realistic synthetic data with varying levels of three confounding variables-the tumour purity, the length of a copy number region and the CNA burden (the percentage of CNAs present in a profiled genome)-and evaluated the performance of OncoSNP, ASCAT, GenoCNA, GISTIC and CGHcall. Furthermore, we implemented and assessed CGHcall*, an adjusted version of CGHcall accounting for high CNA burden. Our analysis on synthetic data indicates that tumour purity and the CNA burden strongly influence the performance of all the algorithms. No algorithm can correctly find lost and gained genomic regions across all tumour purities. The length of CNA regions influenced the performance of ASCAT, CGHcall and GISTIC. OncoSNP, GenoCNA and CGHcall* showed little sensitivity. Overall, CGHcall* and OncoSNP showed reasonable performance, particularly in samples with high tumour purity. Our analysis on the HapMap data revealed a good overlap between CGHcall, CGHcall* and GenoCNA results and experimentally validated data. Our exploratory analysis on the TCGA HNSCC data revealed plausible results of CGHcall, CGHcall* and GISTIC in consensus HNSCC CNA regions. Code is available at https://github.com/adspit/PASCAL.

摘要

已知拷贝数变异(CNA)会对癌基因和肿瘤抑制基因产生强烈影响。鉴于CNA在癌症研究中所起的关键作用,从肿瘤基因组中准确识别CNA至关重要。寻找CNA的一个特殊挑战是混杂变量的影响。为解决这个问题,我们评估了常用的CNA识别算法在存在混杂变量的情况下对SNP 6.0基因分型数据的表现。我们模拟了具有不同水平的三个混杂变量——肿瘤纯度、拷贝数区域长度和CNA负担(在一个分析的基因组中存在的CNA的百分比)的逼真合成数据,并评估了OncoSNP、ASCAT、GenoCNA、GISTIC和CGHcall的性能。此外,我们实现并评估了CGHcall*,这是考虑到高CNA负担的CGHcall的一个调整版本。我们对合成数据的分析表明,肿瘤纯度和CNA负担强烈影响所有算法的性能。没有一种算法能够在所有肿瘤纯度水平上正确找到丢失和获得的基因组区域。CNA区域的长度影响了ASCAT、CGHcall和GISTIC的性能。OncoSNP、GenoCNA和CGHcall表现出较低的敏感性。总体而言,CGHcall和OncoSNP表现出合理的性能,特别是在肿瘤纯度高的样本中。我们对HapMap数据的分析揭示了CGHcall、CGHcall和GenoCNA的结果与实验验证数据之间有良好的重叠。我们对TCGA HNSCC数据的探索性分析揭示了CGHcall、CGHcall和GISTIC在共识HNSCC CNA区域的合理结果。代码可在https://github.com/adspit/PASCAL获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验