Suppr超能文献

使用模拟和真实基因分型数据比较癌症中拷贝数改变的检测方法。

Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data.

机构信息

Genome Analysis Platform, CIC bioGUNE-CIBERehd, Technologic Park of Bizkaia, Building 502, 48160 Derio, Spain.

出版信息

BMC Bioinformatics. 2012 Aug 7;13:192. doi: 10.1186/1471-2105-13-192.

Abstract

BACKGROUND

The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable.

RESULTS

We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels.Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based.

CONCLUSIONS

We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis.

摘要

背景

基于 SNP 阵列的癌症基因组拷贝数改变 (CNA) 的检测需要考虑肿瘤特异性因素的方法,例如正常细胞污染和肿瘤异质性。最近开发了许多工具,但它们的性能仍需要进行彻底评估。为此,需要一种综合模型,该模型整合了正常细胞污染和肿瘤内异质性的因素,并可以转化为用于执行基准测试的合成数据。

结果

我们提出了这样的模型,并在一个名为 CnaGen 的 R 包中实现了它,以在不同的正常细胞污染水平下合成生成广泛的改变。在这种合成数据和乳腺癌细胞系的稀释系列上评估了最近发表的六种用于肿瘤样本的 CNA 和杂合性丢失 (LOH) 检测的方法:ASCAT、GAP、GenoCNA、GPHMM、MixHMM 和 OncoSNP。我们报告了正常细胞污染水平和改变特征的召回率:长度、拷贝数和 LOH 状态,以及在不同正常细胞污染水平下每个拷贝数的假发现率分布。评估的方法通常更擅长检测低拷贝数和少量正常细胞污染水平下的改变。除了 GPHMM 之外,所有方法都无法识别细胞系样本中的改变模式,因此在合成和细胞系样本集中提供了相似的结果。MixHMM 和 GenoCNA 是性能最差的方法,而 GAP 通常表现更好。这支持了除常见隐马尔可夫模型 (HMM) 之外的方法的可行性。

结论

我们设计并实现了一种综合模型,以生成使用 SNP 阵列对肿瘤样本进行基因分型的模拟数据。模型的有效性得到了合成数据和真实数据结果相似性的支持。基于这些结果和方法的软件实现,我们推荐高级用户使用 GAP,推荐完全驱动分析的用户使用 GPHMM。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60f3/3472297/d8a45f2ee5dd/1471-2105-13-192-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验