Waaijenborg Sandra, Zwinderman Aeilko H
Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1100 DD Amsterdam, The Netherlands.
Algorithms Mol Biol. 2010 Feb 11;5:17. doi: 10.1186/1748-7188-5-17.
The causes of complex diseases are difficult to grasp since many different factors play a role in their onset. To find a common genetic background, many of the existing studies divide their population into controls and cases; a classification that is likely to cause heterogeneity within the two groups. Rather than dividing the study population into cases and controls, it is better to identify the phenotype of a complex disease by a set of intermediate risk factors. But these risk factors often vary over time and are therefore repeatedly measured.
We introduce a method to associate multiple repeatedly measured intermediate risk factors with a high dimensional set of single nucleotide polymorphisms (SNPs). Via a two-step approach, we summarized the time courses of each individual and, secondly apply these to penalized nonlinear canonical correlation analysis to obtain sparse results.
Application of this method to two datasets which study the genetic background of cardiovascular diseases, show that compared to progression over time, mainly the constant levels in time are associated with sets of SNPs.
复杂疾病的病因难以把握,因为许多不同因素在其发病过程中起作用。为了找到共同的遗传背景,许多现有研究将其人群分为对照组和病例组;这种分类很可能导致两组内部的异质性。与其将研究人群分为病例组和对照组,不如通过一组中间风险因素来识别复杂疾病的表型。但这些风险因素往往随时间变化,因此需要反复测量。
我们引入了一种方法,将多个反复测量的中间风险因素与一组高维单核苷酸多态性(SNP)相关联。通过两步法,我们总结了每个个体的时间进程,其次将这些应用于惩罚非线性典型相关分析以获得稀疏结果。
将该方法应用于两个研究心血管疾病遗传背景的数据集,结果表明,与随时间的进展相比,主要是时间上的恒定水平与SNP集相关。