Butts Jessica, Verace Leif, Wendt Christine, Bowler Russell, Hersh Craig P, Long Qi, Eberly Lynn, Safo Sandra E
Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA.
Division of Pulmonary, Allergy and Critical Care, University of Minnesota, Minneapolis, Minnesota, USA.
Stat Med. 2025 Apr;44(8-9):e70036. doi: 10.1002/sim.70036.
Multiple data views measured on the same set of participants are becoming more common and have the potential to deepen our understanding of many complex diseases by analyzing these different views simultaneously. Equally important, many of these complex diseases show evidence of subgroup heterogeneity (e.g., by sex or race). HIP (Heterogeneity in Integration and Prediction) is among the first methods proposed to integrate multiple data views while also accounting for subgroup heterogeneity to identify common and subgroup-specific markers of a particular disease. However, HIP is applicable to continuous outcomes and requires programming expertise by the user. Here we propose extensions to HIP that accommodate multi-class, Poisson, and Zero-Inflated Poisson outcomes while retaining the benefits of HIP. Additionally, we introduce an R Shiny application, accessible on shinyapps.io at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/, that provides an interface with the Python implementation of HIP to allow more researchers to use the method anywhere and on any device. We applied HIP to identify genes and proteins common and specific to males and females that are associated with exacerbation frequency. Although some of the identified genes and proteins show evidence of a relationship with chronic obstructive pulmonary disease (COPD) in existing literature, others may be candidates for future research investigating their relationship with COPD. We demonstrate the use of the Shiny application with publicly available data. An R-package for HIP is available at https://github.com/lasandrall/HIP.
在同一组参与者身上测量的多个数据视图正变得越来越普遍,并且通过同时分析这些不同视图,有可能加深我们对许多复杂疾病的理解。同样重要的是,许多这些复杂疾病显示出亚组异质性的证据(例如,按性别或种族)。HIP(整合与预测中的异质性)是最早提出的整合多个数据视图同时考虑亚组异质性以识别特定疾病的共同和亚组特异性标志物的方法之一。然而,HIP适用于连续结局,并且需要用户具备编程专业知识。在这里,我们提出了对HIP的扩展,以适应多分类、泊松和零膨胀泊松结局,同时保留HIP的优点。此外,我们引入了一个R Shiny应用程序,可在https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/的shinyapps.io上访问,它提供了一个与HIP的Python实现的接口,使更多研究人员能够在任何地方、任何设备上使用该方法。我们应用HIP来识别与男性和女性共同且特定的、与急性加重频率相关的基因和蛋白质。虽然一些已识别的基因和蛋白质在现有文献中显示出与慢性阻塞性肺疾病(COPD)有关系的证据,但其他一些可能是未来研究其与COPD关系的候选对象。我们使用公开可用的数据展示了Shiny应用程序的使用。HIP的R包可在https://github.com/lasandrall/HIP上获取。