Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55414, USA.
Division of Pulmonary, Allergy and Critical Care, University of Minnesota, Minneapolis, MN 55455, USA.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae470.
Epidemiologic and genetic studies in many complex diseases suggest subgroup disparities (e.g. by sex, race) in disease course and patient outcomes. We consider this from the standpoint of integrative analysis where we combine information from different views (e.g. genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. We propose Heterogeneity in Integration and Prediction (HIP), a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to a subgroup. We apply HIP to proteomics and gene expression data pertaining to chronic obstructive pulmonary disease (COPD) to identify proteins and genes shared by, and unique to, males and females, contributing to the variation in COPD, measured by airway wall thickness. Our COPD findings have identified proteins, genes, and pathways that are common across and specific to males and females, some implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms. HIP accounts for subgroup heterogeneity in multi-view data, ranks variables based on importance, is applicable to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. With the efficient algorithms implemented using PyTorch, this method has many potential scientific applications and could enhance multiomics research in health disparities. HIP is available at https://github.com/lasandrall/HIP, a video tutorial at https://youtu.be/O6E2OLmeMDo and a Shiny Application at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ for users with limited programming experience.
在许多复杂疾病的流行病学和遗传学研究中,都表明疾病进程和患者预后存在亚组差异(例如,按性别、种族划分)。我们从综合分析的角度来考虑这个问题,即我们结合来自不同视角的信息(例如基因组学、蛋白质组学、临床数据)。现有的综合分析方法忽略了亚组的异质性,而堆叠视图并考虑亚组异质性并不能模拟视图之间的关联。我们提出了整合和预测中的异质性(Heterogeneity in Integration and Prediction,HIP),这是一种联合关联和预测的统计方法,利用每个视图的优势来识别与亚组共享和特定于亚组的分子特征。我们将 HIP 应用于与慢性阻塞性肺疾病(COPD)相关的蛋白质组学和基因表达数据,以识别在男性和女性中共享且特定的蛋白质和基因,这些蛋白质和基因与 COPD 的变化有关,通过气道壁厚度来衡量。我们的 COPD 研究结果确定了在男性和女性中普遍存在且特定的蛋白质、基因和途径,其中一些与 COPD 有关,而另一些则可能为 COPD 机制中的性别差异提供新的见解。HIP 考虑了多视图数据中的亚组异质性,根据重要性对变量进行排名,适用于单变量或多变量连续结果,并纳入协变量调整。通过使用 PyTorch 实现的高效算法,该方法具有许多潜在的科学应用,并且可以增强健康差异的多组学研究。HIP 可在 https://github.com/lasandrall/HIP 上获得,视频教程可在 https://youtu.be/O6E2OLmeMDo 上获得,对于编程经验有限的用户,还可在 https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ 上获得 Shiny 应用程序。