MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.
Department of Epidemiology and Biostatistics, School of Public Health, St Mary's Hospital, Imperial College London, London, United Kingdom.
PLoS Genet. 2022 Jan 27;18(1):e1009975. doi: 10.1371/journal.pgen.1009975. eCollection 2022 Jan.
Clustering genetic variants based on their associations with different traits can provide insight into their underlying biological mechanisms. Existing clustering approaches typically group variants based on the similarity of their association estimates for various traits. We present a new procedure for clustering variants based on their proportional associations with different traits, which is more reflective of the underlying mechanisms to which they relate. The method is based on a mixture model approach for directional clustering and includes a noise cluster that provides robustness to outliers. The procedure performs well across a range of simulation scenarios. In an applied setting, clustering genetic variants associated with body mass index generates groups reflective of distinct biological pathways. Mendelian randomization analyses support that the clusters vary in their effect on coronary heart disease, including one cluster that represents elevated body mass index with a favourable metabolic profile and reduced coronary heart disease risk. Analysis of the biological pathways underlying this cluster identifies inflammation as potentially explaining differences in the effects of increased body mass index on coronary heart disease.
基于与不同性状的关联对遗传变异进行聚类可以深入了解其潜在的生物学机制。现有的聚类方法通常基于各种性状的关联估计值的相似性来对变异进行分组。我们提出了一种新的基于与不同性状的比例关联来聚类变异的方法,该方法更能反映其相关的潜在机制。该方法基于用于定向聚类的混合模型方法,并包括一个噪声聚类,可增强对异常值的稳健性。该方法在一系列模拟场景中表现良好。在实际应用中,对与体重指数相关的遗传变异进行聚类可以生成反映不同生物学途径的组。孟德尔随机化分析支持这些聚类在其对冠心病的影响上存在差异,包括一个代表体重指数升高但代谢特征良好且冠心病风险降低的聚类。对该聚类的生物学途径进行分析表明,炎症可能解释了体重指数升高对冠心病的影响存在差异。