Niu Yue S, Zhang Heping
Department of Mathematics University of Arizona Tucson, Arizona 85721 USA
Ann Appl Stat. 2012 Sep;6(3):1306-1326. doi: 10.1214/12-AOAS539SUPP.
DNA Copy number variation (CNV) has recently gained considerable interest as a source of genetic variation that likely influences phenotypic differences. Many statistical and computational methods have been proposed and applied to detect CNVs based on data that generated by genome analysis platforms. However, most algorithms are computationally intensive with complexity at least (), where is the number of probes in the experiments. Moreover, the theoretical properties of those existing methods are not well understood. A faster and better characterized algorithm is desirable for the ultra high throughput data. In this study, we propose the Screening and Ranking algorithm (SaRa) which can detect CNVs fast and accurately with complexity down to (). In addition, we characterize theoretical properties and present numerical analysis for our algorithm.
DNA拷贝数变异(CNV)作为一种可能影响表型差异的遗传变异来源,最近受到了广泛关注。基于基因组分析平台生成的数据,人们已经提出并应用了许多统计和计算方法来检测CNV。然而,大多数算法计算量很大,复杂度至少为(),其中是实验中探针的数量。此外,这些现有方法的理论特性还没有得到很好的理解。对于超高通量数据,需要一种更快且特征更明确的算法。在本研究中,我们提出了筛选和排序算法(SaRa),该算法能够快速、准确地检测CNV,复杂度低至()。此外,我们还对算法的理论特性进行了描述,并给出了数值分析。