Suppr超能文献

在ClinSeq®和弗雷明汉心脏研究队列中,基于基因型驱动识别预测晚期冠状动脉钙化的分子网络。

Genotype-driven identification of a molecular network predictive of advanced coronary calcium in ClinSeq® and Framingham Heart Study cohorts.

作者信息

Oguz Cihan, Sen Shurjo K, Davis Adam R, Fu Yi-Ping, O'Donnell Christopher J, Gibbons Gary H

机构信息

Cardiovascular Disease Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Office of Biostatistics Research, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA.

出版信息

BMC Syst Biol. 2017 Oct 26;11(1):99. doi: 10.1186/s12918-017-0474-5.

Abstract

BACKGROUND

One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD).

METHODS

Model inputs were derived from advanced cases in the ClinSeq®; discovery cohort (n=16) and the FHS replication cohort (n=36) from 89 -99 CAC score percentile range, and age-matched controls (ClinSeq®; n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables and genotypes of 56 single nucleotide polymorphisms (SNPs) ranked highest in terms of their nominal correlation with the advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristic curves (ROC-AUC).

RESULTS

RF models trained and tested with clinical variables generated ROC-AUC values of 0.69 and 0.61 in the discovery and replication cohorts, respectively. In contrast, in both cohorts, the set of SNPs derived from the discovery cohort were highly predictive (ROC-AUC ≥0.85) with no significant change in predictive performance upon integration of clinical and genotype variables. Using the 21 SNPs that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq®; data and tested with FHS data and obtained high predictive accuracy (ROC-AUC=0.80-0.85) with several topologies. Several CAD and "vascular aging" related biological processes were enriched in the network of genes constructed from the predictive SNPs.

CONCLUSIONS

We identified a molecular network predictive of advanced coronary calcium using genotype data from ClinSeq®; and FHS cohorts. Our results illustrate that machine learning tools, which utilize complex interactions between disease predictors intrinsic to the pathogenesis of polygenic disorders, hold promise for deriving predictive disease models and networks.

摘要

背景

精准医学的一个目标是利用新兴的数据科学工具来指导医疗决策。对于多基因性状而言,使用不同的数据源来实现这一目标是最具挑战性的。为此,我们采用随机森林(RF)和神经网络(NN)对冠状动脉钙化(CAC)进行预测建模,冠状动脉钙化是冠状动脉疾病(CAD)的一种中间内表型。

方法

模型输入数据来自ClinSeq®中的晚期病例;发现队列(n = 16)和弗雷明汉心脏研究(FHS)复制队列(n = 36),其CAC评分处于89 - 99百分位数范围,以及年龄匹配的无可检测到CAC的对照组(ClinSeq®;n = 16,FHS n = 36)(所有受试者均为白人男性)。这些输入包括临床变量以及在发现队列中与晚期CAC状态的名义相关性排名最高的56个单核苷酸多态性(SNP)的基因型。通过计算受试者操作特征曲线下面积(ROC-AUC)来评估预测性能。

结果

使用临床变量训练和测试的RF模型在发现队列和复制队列中分别产生了0.69和0.61的ROC-AUC值。相比之下,在两个队列中,源自发现队列的SNP集具有高度预测性(ROC-AUC≥0.85),在整合临床和基因型变量后预测性能没有显著变化。使用在两个队列中产生最佳预测性能的21个SNP,我们开发了用ClinSeq®数据训练并用FHS数据测试的NN模型,并通过几种拓扑结构获得了高预测准确性(ROC-AUC = 0.80 - 0.85)。从预测性SNP构建的基因网络中富集了几个与CAD和“血管老化”相关的生物学过程。

结论

我们利用ClinSeq®和FHS队列的基因型数据确定了一个预测晚期冠状动脉钙化的分子网络。我们的结果表明,利用多基因疾病发病机制中疾病预测因子之间复杂相互作用的机器学习工具,有望推导预测性疾病模型和网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3aa8/5659034/20ee69609db9/12918_2017_474_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验