Lin Peng, Hartz Sarah M, Wang Jen-Chyong, Krueger Robert F, Foroud Tatiana M, Edenberg Howard J, Nurnberger John I, Brooks Andrew I, Tischfield Jay A, Almasy Laura, Webb Bradley T, Hesselbrock Victor M, Porjesz Bernice, Goate Alison M, Bierut Laura J, Rice John P
Department of Psychiatry, Washington University, St. Louis, MO 63110, USA.
Hum Hered. 2011;71(3):141-7. doi: 10.1159/000324683. Epub 2011 Jul 20.
BACKGROUND/AIM: Copy number variations (CNVs) are a major source of alterations among individuals and are a potential risk factor in many diseases. Numerous diseases have been linked to deletions and duplications of these chromosomal segments. Data from genome-wide association studies and other microarrays may be used to identify CNVs by several different computer programs, but the reliability of the results has been questioned.
To help researchers reduce the number of false-positive CNVs that need to be followed up with laboratory testing, we evaluated the relative performance of CNVPartition, PennCNV and QuantiSNP, and developed a statistical method for estimating sensitivity and positive predictive values of CNV calls and tested it on 96 duplicate samples in our dataset.
We found that the positive predictive rate increases with the number of probes in the CNV and the size of the CNV, with the highest positive predicted rates in CNVs of at least 500 kb and at least 100 probes. Our analysis also indicates that identifying CNVs reported by multiple programs can greatly improve the reproducibility rate and the positive predicted rate.
Our methods can be used by investigators to identify CNVs in genome-wide data with greater reliability.
背景/目的:拷贝数变异(CNV)是个体间变异的主要来源,也是许多疾病的潜在风险因素。许多疾病都与这些染色体片段的缺失和重复有关。全基因组关联研究和其他微阵列的数据可通过几种不同的计算机程序用于识别CNV,但结果的可靠性受到质疑。
为帮助研究人员减少需要通过实验室检测进行后续验证的假阳性CNV数量,我们评估了CNVPartition、PennCNV和QuantiSNP的相对性能,并开发了一种统计方法来估计CNV检测的敏感性和阳性预测值,并在我们数据集中的96个重复样本上进行了测试。
我们发现阳性预测率随着CNV中探针数量和CNV大小的增加而提高,在至少500 kb且至少100个探针的CNV中阳性预测率最高。我们的分析还表明,识别多个程序报告的CNV可以大大提高重现率和阳性预测率。
我们的方法可供研究人员用于更可靠地识别全基因组数据中的CNV。