一种基于单阵列的方法，利用Affymetrix高密度SNP阵列检测拷贝数变异及其在乳腺癌中的应用。

A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer.

作者信息

Li Ming, Wen Yalu, Fu Wenjiang

机构信息

Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.

Department of Epidemiology and Biostatistics, Michigan State University, East Lansing MI, USA.

出版信息

Cancer Inform. 2015 Jul 16;13(Suppl 4):95-103. doi: 10.4137/CIN.S15203. eCollection 2014.

DOI:10.4137/CIN.S15203

PMID:26279618

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4519351/

Abstract

Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.

摘要

越来越多的证据表明，由于DNA的插入、缺失和倒位导致的结构变异，可能在诸如乳腺癌等复杂人类疾病的发生发展中起重要作用。高通量基因分型技术，如Affymetrix高密度单核苷酸多态性（SNP）阵列，已经产生了大量用于全基因组SNP基因型分型和拷贝数估计的遗传数据。与此同时，迫切需要准确有效的统计方法来检测拷贝数变异。在本文中，我们介绍了一种基于隐马尔可夫模型（HMM）的方法，称为PICR-CNV，用于拷贝数推断。该方法首先根据原始荧光值估计单个阵列上每个单SNP的拷贝数丰度，然后对估计的拷贝数丰度进行标准化，以在多个阵列之间实现公平比较。该方法不需要阵列间归一化，因此，保持了个体受试者间样本的数据完整性和独立性。除了将新的统计技术应用于原始荧光值外，HMM还被应用于标准化的拷贝数丰度，以减少实验噪声。通过模拟，我们表明我们改进的方法能够准确推断拷贝数变异。将所提出的方法应用于乳腺癌数据集有助于识别与该疾病显著相关的基因组区域。