de Leeuw Nicole, Dijkhuizen Trijnie, Hehir-Kwa Jayne Y, Carter Nigel P, Feuk Lars, Firth Helen V, Kuhn Robert M, Ledbetter David H, Martin Christa Lese, van Ravenswaaij-Arts Conny M A, Scherer Steven W, Shams Soheil, Van Vooren Steven, Sijmons Rolf, Swertz Morris, Hastings Ros
Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands.
Hum Mutat. 2012 Jun;33(6):930-40. doi: 10.1002/humu.22049.
The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy-number variants (CNVs) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires expertise. With our knowledge of the human genome growing rapidly, applications for array testing continuously broadening, and the resolution of CNV detection increasing, this leads to great complexity in interpreting what can be daunting data. Correct CNV interpretation and optimal use of the genotype information provided by single-nucleotide polymorphism probes on an array depends largely on knowledge present in various resources. In addition to the availability of host laboratories' own datasets and national registries, there are several public databases and Internet resources with genotype and phenotype information that can be used for array data interpretation. With so many resources now available, it is important to know which are fit-for-purpose in a diagnostic setting. We summarize the characteristics of the most commonly used Internet databases and resources, and propose a general data interpretation strategy that can be used for comparative hybridization, comparative intensity, and genotype-based array data.
市面上可买到的阵列平台和分析软件包的种类正在不断增加,其效用也在不断提高,使得可靠检测拷贝数变异(CNV)相对变得简单直接。然而,可靠解读CNV数据往往很困难,且需要专业知识。随着我们对人类基因组的了解迅速增加、阵列检测的应用不断拓宽以及CNV检测分辨率的提高,这导致在解读可能令人望而生畏的数据时变得极为复杂。正确解读CNV以及最佳利用阵列上的单核苷酸多态性探针提供的基因型信息,很大程度上取决于各种资源中所包含的知识。除了宿主实验室自身的数据集和国家登记处可获取外,还有几个包含基因型和表型信息的公共数据库及互联网资源可用于阵列数据解读。鉴于现在有如此多的资源可用,了解哪些资源适用于诊断环境很重要。我们总结了最常用的互联网数据库和资源的特点,并提出了一种可用于比较杂交、比较强度和基于基因型的阵列数据的通用数据解读策略。