Wu Long Yang, Chipman Hugh A, Bull Shelley B, Briollais Laurent, Wang Kesheng
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
Bioinformatics. 2009 Jul 1;25(13):1669-79. doi: 10.1093/bioinformatics/btp270. Epub 2009 Apr 23.
Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously-a desirable property that current segmentation methods do not share.
In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In a simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in population-level analysis of previously published HapMap data. We also apply our approach in studying population genetics of CNVs.
R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML
在群体水平上高效且准确地确定拷贝数变异(CNV)对于理解进化过程和群体遗传学,以及在基于群体的全基因组关联研究中应用CNV来研究复杂人类疾病至关重要。我们提出了一种新颖的贝叶斯分割方法,用于在任意规模的特定群体中识别CNV。该方法计算效率高,并通过贝叶斯因子为检测到的CNV提供统计证据。此方法具有同时进行分割和确定拷贝数状态的独特特征,这是当前分割方法所不具备的理想特性。
在使用基准模拟研究对单个个体与流行的两步分割方法进行比较时,我们发现新方法在断点检测的错误发现率和灵敏度方面具有竞争力。在对具有重复拷贝数的多个样本进行的模拟研究中,新方法优于两种领先的单样本方法。我们进一步证明了我们的方法在对先前发表的HapMap数据进行群体水平分析中的有效性。我们还将我们的方法应用于研究CNV的群体遗传学。
R程序可在http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML获取。