Platt Daniel E, Basu Saugata, Zalloua Pierre A, Parida Laxmi
Computational Biology Center, IBM T. J. Watson Research Center, 1101 Kitchawan Rd., Yorktown Hgts, 10598, NY, USA.
Department of Mathematics, Purdue University, 150 N. University St., West Lafayette, 47907, IN, USA.
BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):10. doi: 10.1186/s12918-015-0251-2.
Complex diseases may have multiple pathways leading to disease. E.g. coronary artery disease evolves from arterial damage to their epithelial layers, but has multiple causal pathways. More challenging, those pathways are highly correlated within metabolic syndrome. The challenge is to identify specific clusters of phenotype characteristics (composite phenotypes) that may reflect these different etiologies. Further, GWAS seeking to identify SNPs satisfying multiple composite phenotype descriptions allows for lower false positive rates at lower α thresholds, allowing for the possibility of reducing false negatives. This may provide a window into the missing heritability problem.
We identify significant phenotype patterns, and identify fuzzy redescriptions among those patterns using Jaccard distances. Further, we construct Vietoris-Rips complexes from the Jaccard distances and compute the persistent homology associated with those. The patterns comprising these topological features are identified as composite phenotpyes, whose genetic associations are explored with logistic regression applied to pathways and to GWAS.
We identified several phenotypes that tended to be dominated by metabolic syndrome descriptions, and which were distinct among the combinations of metabolic syndrome conditions. Among SNPs marking the RAAS complex, various SNPs associated specifically with different groups of composite phenotypes, as well as distinguishing between the composite phenotypes and simple phenotypes. Each of these showed different genetic associations, namely rs6693954, rs762551, rs1378942, and rs1133323. GWAS identified SNPs that associated with composite phenotypes included rs12365545, rs6847235, and rs701319. Eighteen GWAS identified SNPs appeared in combinations supported in composite combinations with greater power than for any individual phenotype.
We do find systematic associations among metabolic syndrome variates that show distinctive genetic association profiles. Further, the systematic characterization involves composite phenotype descriptions that allow for combined power of individual phenotype GWAS tests, yielding more significance for lower individual thresholds, permitting the exploration of SNPs that would otherwise show as false negatives.
复杂疾病可能有多种致病途径。例如,冠状动脉疾病始于动脉上皮层损伤,但有多种致病途径。更具挑战性的是,这些途径在代谢综合征中高度相关。挑战在于识别可能反映这些不同病因的特定表型特征簇(复合表型)。此外,全基因组关联研究(GWAS)旨在识别满足多种复合表型描述的单核苷酸多态性(SNP),在较低的α阈值下可降低假阳性率,从而有可能减少假阴性。这可能为解决“缺失的遗传力”问题提供一个窗口。
我们识别显著的表型模式,并使用杰卡德距离在这些模式中识别模糊的重新描述。此外,我们根据杰卡德距离构建维托里斯 - 里普斯复形,并计算与之相关的持久同调。包含这些拓扑特征的模式被识别为复合表型,通过应用于通路和GWAS的逻辑回归来探索其遗传关联。
我们识别出几种倾向于由代谢综合征描述主导的表型,并且在代谢综合征条件的组合中是不同的。在标记肾素 - 血管紧张素 - 醛固酮系统(RAAS)复合物的SNP中,各种SNP与不同组的复合表型特异性相关,并且区分了复合表型和简单表型。其中每一个都显示出不同的遗传关联,即rs6693954、rs762551、rs1378942和rs1133323。GWAS识别出与复合表型相关的SNP包括rs12365545、rs6847235和rs701319。18个GWAS识别出的SNP以复合组合的形式出现,其效力大于任何单个表型。
我们确实发现代谢综合征变量之间存在系统关联,这些关联显示出独特的遗传关联谱。此外,系统表征涉及复合表型描述,这允许单个表型GWAS测试的联合效力,对于较低的个体阈值产生更大显著性,从而允许探索那些否则会显示为假阴性的SNP。