Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria.
Nucleic Acids Res. 2011 Jul;39(12):e79. doi: 10.1093/nar/gkr197. Epub 2011 Apr 12.
Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.
像 Affymetrix SNP 6.0 这样的具有成本效益的寡核苷酸基因分型芯片仍然是测量 DNA 拷贝数变异 (CNV) 的主要技术。然而,微阵列的 CNV 检测方法高估了 CNV 区域的数量和大小,因此存在高假发现率 (FDR)。高 FDR 意味着许多 CNV 被错误地检测到,因此在临床研究中与疾病无关,尽管对多次测试进行了校正,但考虑到这一点,会降低研究的发现能力。为了控制 FDR,我们提出了一个概率潜在变量模型“cn.FARMS”,该模型通过贝叶斯最大后验方法进行优化。cn.FARMS 通过后验相对于先验的信息增益来控制 FDR。先验代表所有样本的拷贝数 2 的零假设,而后验只能通过数据中的强而一致的信号偏离。在 HapMap 数据上,cn.FARMS 在灵敏度和 FDR 方面明显优于两种最流行的方法。cn.FARMS 软件作为 R 包在 http://www.bioinf.jku.at/software/cnfarms/cnfarms.html 上公开提供。