Biomathematics Research Group, Department of Mathematics, University of Turku, Turku, Finland.
PLoS Genet. 2010 Sep 30;6(9):e1001146. doi: 10.1371/journal.pgen.1001146.
The relative contribution of genetic risk factors to the progression of subclinical atherosclerosis is poorly understood. It is likely that multiple variants are implicated in the development of atherosclerosis, but the subtle genotypic and phenotypic differences are beyond the reach of the conventional case-control designs and the statistical significance testing procedures being used in most association studies. Our objective here was to investigate whether an alternative approach--in which common disorders are treated as quantitative phenotypes that are continuously distributed over a population--can reveal predictive insights into the early atherosclerosis, as assessed using ultrasound imaging-based quantitative measurement of carotid artery intima-media thickness (IMT). Using our population-based follow-up study of atherosclerosis precursors as a basis for sampling subjects with gradually increasing IMT levels, we searched for such subsets of genetic variants and their interactions that are the most predictive of the various risk classes, rather than using exclusively those variants meeting a stringent level of statistical significance. The area under the receiver operating characteristic curve (AUC) was used to evaluate the predictive value of the variants, and cross-validation was used to assess how well the predictive models will generalize to other subsets of subjects. By means of our predictive modeling framework with machine learning-based SNP selection, we could improve the prediction of the extreme classes of atherosclerosis risk and progression over a 6-year period (average AUC 0.844 and 0.761), compared to that of using conventional cardiovascular risk factors alone (average AUC 0.741 and 0.629), or when combined with the statistically significant variants (average AUC 0.762 and 0.651). The predictive accuracy remained relatively high in an independent validation set of subjects (average decrease of 0.043). These results demonstrate that the modeling framework can utilize the "gray zone" of genetic variation in the classification of subjects with different degrees of risk of developing atherosclerosis.
遗传风险因素对亚临床动脉粥样硬化进展的相对贡献尚不清楚。很可能有多种变体参与动脉粥样硬化的发生,但细微的基因型和表型差异超出了传统病例对照设计和大多数关联研究中使用的统计显著性检验程序的范围。我们的目标是研究一种替代方法,即将常见疾病视为在人群中连续分布的定量表型,是否可以通过超声成像定量测量颈动脉内膜中层厚度(IMT)来揭示对早期动脉粥样硬化的预测性见解。我们使用基于人群的动脉粥样硬化前体随访研究作为对具有逐渐增加 IMT 水平的受试者进行抽样的基础,寻找最能预测各种风险类别的遗传变异及其相互作用的子集,而不是仅使用那些符合严格统计学意义水平的变异。接收器操作特征曲线(AUC)下的面积用于评估变体的预测价值,交叉验证用于评估预测模型在其他受试者子集中的泛化程度。通过我们基于机器学习的 SNP 选择的预测建模框架,我们可以提高对动脉粥样硬化风险和 6 年内进展的极端类别的预测(平均 AUC 为 0.844 和 0.761),与仅使用传统心血管危险因素(平均 AUC 为 0.741 和 0.629)相比,或与具有统计学意义的变体相结合(平均 AUC 为 0.762 和 0.651)。在独立的受试者验证集中,预测准确性仍然相对较高(平均降低 0.043)。这些结果表明,该建模框架可以利用遗传变异的“灰色区域”对具有不同程度发展为动脉粥样硬化风险的受试者进行分类。