Daura-Oller Elias, Cabré Maria, Montero Miguel A, Paternáin José L, Romeu Antoni
Biochemistry and Biotechnology Department, Faculty of Chemistry, Rovira i Virgili University (URV), c/Marcel-li Domingo, s/n. Campus Sescelades, 43007 Tarragona, Spain.
Comp Funct Genomics. 2009;2009:549387. doi: 10.1155/2009/549387. Epub 2009 Apr 8.
In the present study, a positive training set of 30 known human imprinted gene coding regions are compared with a set of 72 randomly sampled human nonimprinted gene coding regions (negative training set) to identify genomic features common to human imprinted genes. The most important feature of the present work is its ability to use multivariate analysis to look at variation, at coding region DNA level, among imprinted and non-imprinted genes. There is a force affecting genomic parameters that appears through the use of the appropriate multivariate methods (principle components analysis (PCA) and quadratic discriminant analysis (QDA)) to analyse quantitative genomic data. We show that variables, such as CG content, [bp]% CpG islands, [bp]% Large Tandem Repeats, and [bp]% Simple Repeats, are able to distinguish coding regions of human imprinted genes.
在本研究中,将30个已知的人类印迹基因编码区域的阳性训练集与一组72个随机抽样的人类非印迹基因编码区域(阴性训练集)进行比较,以确定人类印迹基因共有的基因组特征。本研究最重要的特点是能够使用多变量分析来观察印迹基因和非印迹基因在编码区DNA水平上的变异。通过使用适当的多变量方法(主成分分析(PCA)和二次判别分析(QDA))来分析定量基因组数据,有一种影响基因组参数的力量显现出来。我们表明,诸如CG含量、[bp]% CpG岛、[bp]% 大串联重复和[bp]% 简单重复等变量能够区分人类印迹基因的编码区域。