Department of Science, University of Sannio, 82100, Benevento, Italy.
Bioinformatics. 2011 Nov 1;27(21):2949-56. doi: 10.1093/bioinformatics/btr488. Epub 2011 Aug 25.
Copy number alterations (CNAs) represent an important component of genetic variation and play a significant role in many human diseases. Development of array comparative genomic hybridization (aCGH) technology has made it possible to identify CNAs. Identification of recurrent CNAs represents the first fundamental step to provide a list of genomic regions which form the basis for further biological investigations. The main problem in recurrent CNAs discovery is related to the need to distinguish between functional changes and random events without pathological relevance. Within-sample homogeneity represents a common feature of copy number profile in cancer, so it can be used as additional source of information to increase the accuracy of the results. Although several algorithms aimed at the identification of recurrent CNAs have been proposed, no attempt of a comprehensive comparison of different approaches has yet been published.
We propose a new approach, called Genomic Analysis of Important Alterations (GAIA), to find recurrent CNAs where a statistical hypothesis framework is extended to take into account within-sample homogeneity. Statistical significance and within-sample homogeneity are combined into an iterative procedure to extract the regions that likely are involved in functional changes. Results show that GAIA represents a valid alternative to other proposed approaches. In addition, we perform an accurate comparison by using two real aCGH datasets and a carefully planned simulation study.
GAIA has been implemented as R/Bioconductor package. It can be downloaded from the following page http://bioinformatics.biogem.it/download/gaia.
ceccarelli@unisannio.it; morganella@unisannio.it.
Supplementary data are available at Bioinformatics online.
拷贝数改变(CNAs)是遗传变异的一个重要组成部分,在许多人类疾病中起着重要作用。阵列比较基因组杂交(aCGH)技术的发展使得识别 CNAs 成为可能。识别反复出现的 CNA 是提供构成进一步生物学研究基础的基因组区域列表的第一步。反复出现的 CNA 发现的主要问题与需要区分功能变化和无病理相关性的随机事件有关。样本内同质性是癌症中拷贝数谱的一个共同特征,因此可以作为额外的信息来源,以提高结果的准确性。尽管已经提出了几种旨在识别反复出现的 CNA 的算法,但尚未发表对不同方法进行全面比较的尝试。
我们提出了一种新的方法,称为基因组重要改变分析(GAIA),用于寻找反复出现的 CNA,其中统计假设框架扩展到考虑样本内同质性。统计显著性和样本内同质性结合到一个迭代过程中,以提取可能涉及功能变化的区域。结果表明,GAIA 是其他提出的方法的有效替代方法。此外,我们通过使用两个真实的 aCGH 数据集和一个精心计划的模拟研究进行了准确的比较。
GAIA 已作为 R/Bioconductor 包实现。可以从以下页面下载:http://bioinformatics.biogem.it/download/gaia。
ceccarelli@unisannio.it;morganella@unisannio.it。
补充数据可在 Bioinformatics 在线获得。