Suppr超能文献

用于估计捕获测序数据中 DNA 拷贝数的回归模型。

A regression model for estimating DNA copy number applied to capture sequencing data.

机构信息

Department of Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

出版信息

Bioinformatics. 2012 Sep 15;28(18):2357-65. doi: 10.1093/bioinformatics/bts448. Epub 2012 Jul 13.

Abstract

MOTIVATION

Target enrichment, also referred to as DNA capture, provides an effective way to focus sequencing efforts on a genomic region of interest. Capture data are typically used to detect single-nucleotide variants. It can also be used to detect copy number alterations, which is particularly useful in the context of cancer, where such changes occur frequently. In copy number analysis, it is a common practice to determine log-ratios between test and control samples, but this approach results in a loss of information as it disregards the total coverage or intensity at a locus.

RESULTS

We modeled the coverage or intensity of the test sample as a linear function of the control sample. This regression approach is able to deal with regions that are completely deleted, which are problematic for methods that use log-ratios. To demonstrate the utility of our approach, we used capture data to determine copy number for a set of 600 genes in a panel of nine breast cancer cell lines. We found high concordance between our results and those generated using a single-nucleotide polymorphsim genotyping platform. When we compared our results with other log-ratio-based methods, including ExomeCNV, we found that our approach produced better overall correlation with SNP data.

AVAILABILITY

The algorithm is implemented in C and R and the code can be downloaded from http://bioinformatics.nki.nl/ocs/

CONTACT

l.wessels@nki.nl

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

目标富集,也称为 DNA 捕获,提供了一种将测序工作集中在感兴趣的基因组区域的有效方法。捕获数据通常用于检测单核苷酸变体。它也可用于检测拷贝数改变,这在癌症中特别有用,因为这种改变经常发生。在拷贝数分析中,确定测试和对照样本之间的对数比是一种常见的做法,但这种方法会丢失信息,因为它忽略了基因座的总覆盖或强度。

结果

我们将测试样本的覆盖或强度建模为对照样本的线性函数。这种回归方法能够处理完全缺失的区域,这对于使用对数比的方法来说是有问题的。为了证明我们的方法的实用性,我们使用捕获数据来确定 9 种乳腺癌细胞系的 600 个基因的拷贝数。我们发现我们的结果与使用单核苷酸多态性基因分型平台生成的结果高度一致。当我们将我们的结果与其他基于对数比的方法(包括 ExomeCNV)进行比较时,我们发现我们的方法与 SNP 数据的总体相关性更好。

可用性

该算法是用 C 和 R 实现的,代码可以从 http://bioinformatics.nki.nl/ocs/下载。

联系人

l.wessels@nki.nl

补充信息

补充数据可在Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验