Kachouie Nezamoddin N, Lin Xihong, Christiani David C, Schwartzman Armin
Department of Mathematical Sciences, Florida Institute of Technology.
Department of Statistics, Harvard School of Public Health.
Commun Stat Case Stud Data Anal Appl. 2015;1(4):206-216. doi: 10.1080/23737484.2016.1197079. Epub 2016 Jul 18.
Emerging advances in genomic sequencing have prompted the development of new computational methods for studying the genomic sources of human diseases. This paper presents a recent statistical approach for detection of local regions with significant copy number alterations (CNAs) in lung cancer population. Mapping such regions is of interest as they are potentially associated with lung cancer. Conventional application of multiple testing methods corresponds to testing for CNAs at each probe separately and thresholding the t-statistics as test statistics. Due to the large number of probes, this approach often fails to detect CNA regions. In contrast, the proposed method uses the heights of located peaks and improves the detection power. This is achieved by taking advantage of the spatial structure in the data as well as reducing the number of tests in the multiple comparisons problem. In copy number analysis, it is common to apply segmentation or change detection tools to each individual genomic sample. However, since segmentation results vary among subjects, it becomes difficult to find the common genomic regions in population analyses. Our approach solves this problem by performing the analysis using summary statistics to study at population level directly. Hence, the region detection is performed on the summary t-statistic map. The proposed method is applied to lung cancer data and shows promise for detection of local regions with significant CNAs.
基因组测序领域的新进展推动了用于研究人类疾病基因组来源的新计算方法的发展。本文介绍了一种用于检测肺癌人群中具有显著拷贝数改变(CNA)的局部区域的最新统计方法。绘制这些区域很有意义,因为它们可能与肺癌有关。多重检验方法的传统应用对应于分别在每个探针处检测CNA,并将t统计量作为检验统计量进行阈值处理。由于探针数量众多,这种方法往往无法检测到CNA区域。相比之下,所提出的方法利用定位峰的高度,提高了检测能力。这是通过利用数据中的空间结构以及减少多重比较问题中的检验次数来实现的。在拷贝数分析中,通常对每个个体基因组样本应用分割或变化检测工具。然而,由于分割结果在个体之间存在差异,在群体分析中很难找到共同的基因组区域。我们的方法通过使用汇总统计量直接在群体水平上进行分析来解决这个问题。因此,区域检测是在汇总t统计量图上进行的。所提出的方法应用于肺癌数据,并显示出检测具有显著CNA的局部区域的前景。