Cao Hongbao, Guo Wei, Qin Haide, Xu Mengyuan, Lehrman Benjamin, Tao Yu, Shugart Yin-Yao
Unit on Statistical Genomics, Division of Intramural Research Programs, National Institute of Mental Health, National Institutes of Health, Building 35, Room 3A 1000, 35 Convent Drive, Bethesda, MD 20892 USA.
BMC Proc. 2016 Oct 18;10(Suppl 7):283-288. doi: 10.1186/s12919-016-0044-7. eCollection 2016.
Although many genes have been implicated as hypertension candidates, to date, few studies have integrated different types of genomic data for the purpose of biomarker selection.
Applying a newly proposed sparse representation based variable selection (SRVS) method to the Genetic Analysis Workshop19 data, we analyzed a combined data set consisting of 11522 gene expressions and 354893 single-nucleotide polymorphisms (SNPs) from 397 subjects (case/control: 151/246), with the aim to identify potential biomarkers for blood pressure using both gene expression measures and SNP data.
Among the top 1000 variables (SNPs/gene expressions = 575/425) selected, the bioinformatics analysis showed that 302 were plausibly associated with blood pressure. In addition, we identified 173 variables that were associated with body weight and 84 associated with left ventricular contractility. Together, 55.9 % of the top 1000 variables showed associations with blood pressure related phenotypes(SNP/gene expression =348/211).
Our results support the feasibility of the SRVS algorithm in integrating multiple data sets of different structure for comprehensive analysis.
尽管许多基因被认为是高血压的候选基因,但迄今为止,很少有研究为了生物标志物选择而整合不同类型的基因组数据。
将新提出的基于稀疏表示的变量选择(SRVS)方法应用于遗传分析研讨会19的数据,我们分析了一个由397名受试者(病例/对照:151/246)的11522个基因表达和354893个单核苷酸多态性(SNP)组成的组合数据集,目的是使用基因表达测量和SNP数据识别血压的潜在生物标志物。
在选择的前1000个变量(SNP/基因表达=575/425)中,生物信息学分析表明,302个与血压可能相关。此外,我们确定了173个与体重相关的变量和84个与左心室收缩性相关的变量。在前1000个变量中,共有55.9%与血压相关表型有关(SNP/基因表达=348/211)。
我们的结果支持SRVS算法在整合不同结构的多个数据集进行综合分析方面的可行性。