使用GADA对多个DNA阵列上的拷贝数变异和参考强度进行联合估计。

Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA.

作者信息

Pique-Regi Roger, Ortega Antonio, Asgharzadeh Shahab

机构信息

Signal and Image Processing Institute, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, EEB 400, 3740 McClintock Ave, Los Angeles, CA 90089-2564, USA.

出版信息

Bioinformatics. 2009 May 15;25(10):1223-30. doi: 10.1093/bioinformatics/btp119. Epub 2009 Mar 10.

DOI:10.1093/bioinformatics/btp119

PMID:19276152

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2732310/

Abstract

MOTIVATION

The complexity of a large number of recently discovered copy number polymorphisms is much higher than initially thought, thus making it more difficult to detect them in the presence of significant measurement noise. In this scenario, separate normalization and segmentation is prone to lead to many false detections of changes in copy number. New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results.

METHODS

In this article, the genome alteration detection analysis (GADA) approach introduced in our previous work is extended to a multiple sample model. The copy number component is independent for each sample and uses a sparse Bayesian prior, while the reference hybridization level is not necessarily sparse but identical on all samples. The expectation maximization (EM) algorithm used to fit the model iteratively determines whether the observed hybridization levels are more likely due to a copy number variation or to a shared hybridization bias.

RESULTS

The new proposed approach is compared with the currently used strategy of separate normalization followed by independent segmentation of each array. Real microarray data obtained from HapMap samples are randomly partitioned to create different reference sets. Using the new approach, copy number and reference intensity estimates are significantly less variable if the reference set changes; and a higher consistency on copy numbers detected within HapMap family trios is obtained. Finally, the running time to fit the model grows linearly in the number samples and probes.

AVAILABILITY

http://biron.usc.edu/~piquereg/GADA.

摘要

动机

大量最近发现的拷贝数多态性的复杂性比最初设想的要高得多，因此在存在显著测量噪声的情况下更难检测到它们。在这种情况下，单独的归一化和分割容易导致许多拷贝数变化的误检测。能够跨多个样本联合建模拷贝数和非拷贝数（噪声）杂交效应的新方法可能会带来更准确的结果。

方法

在本文中，我们先前工作中引入的基因组改变检测分析（GADA）方法被扩展为一个多样本模型。拷贝数成分对于每个样本是独立的，并使用稀疏贝叶斯先验，而参考杂交水平不一定稀疏但在所有样本上是相同的。用于拟合模型迭代地确定观察到的杂交水平更可能是由于拷贝数变异还是由于共享杂交偏差的期望最大化（EM）算法。

结果

将新提出的方法与当前使用的先单独归一化然后对每个阵列进行独立分割的策略进行比较。从HapMap样本获得的真实微阵列数据被随机划分以创建不同的参考集。使用新方法，如果参考集改变，拷贝数和参考强度估计的变异性显著更小；并且在HapMap家系三联体中检测到的拷贝数上获得了更高的一致性。最后，拟合模型的运行时间随样本和探针数量线性增长。

可用性

http://biron.usc.edu/~piquereg/GADA。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用GADA对多个DNA阵列上的拷贝数变异和参考强度进行联合估计。

Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA.

作者信息

机构信息

出版信息

MOTIVATION

METHODS

RESULTS

AVAILABILITY

动机

方法

结果

可用性

相似文献

引用本文的文献

本文引用的文献

使用GADA对多个DNA阵列上的拷贝数变异和参考强度进行联合估计。

Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA.

作者信息

机构信息

出版信息

MOTIVATION

METHODS

RESULTS

AVAILABILITY

动机

方法

结果

可用性

相似文献

引用本文的文献

本文引用的文献