Department of Electrical Engineering, University of Minnesota, Minneapolis, MN 55455, United States.
Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae339.
Biomedical research now commonly integrates diverse data types or views from the same individuals to better understand the pathobiology of complex diseases, but the challenge lies in meaningfully integrating these diverse views. Existing methods often require the same type of data from all views (cross-sectional data only or longitudinal data only) or do not consider any class outcome in the integration method, which presents limitations. To overcome these limitations, we have developed a pipeline that harnesses the power of statistical and deep learning methods to integrate cross-sectional and longitudinal data from multiple sources. In addition, it identifies key variables that contribute to the association between views and the separation between classes, providing deeper biological insights. This pipeline includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks for cross-sectional data and recurrent neural networks for longitudinal data. We applied this pipeline to cross-sectional and longitudinal multiomics data (metagenomics, transcriptomics and metabolomics) from an inflammatory bowel disease (IBD) study and identified microbial pathways, metabolites and genes that discriminate by IBD status, providing information on the etiology of IBD. We conducted simulations to compare the two feature extraction methods.
生物医学研究现在通常整合来自同一个体的多种数据类型或视角,以更好地理解复杂疾病的病理生物学,但挑战在于如何有意义地整合这些不同的视角。现有的方法通常要求所有视图都具有相同类型的数据(仅横截面数据或仅纵向数据),或者在集成方法中不考虑任何类别结果,这存在局限性。为了克服这些限制,我们开发了一个利用统计和深度学习方法的管道,从多个来源整合来自横截面和纵向的数据。此外,它还确定了有助于视图之间关联和类别之间分离的关键变量,提供了更深入的生物学见解。该管道包括使用线性和非线性方法进行变量选择/排名、使用功能主成分分析和欧拉特征进行特征提取,以及使用密集前馈网络进行横截面数据和递归神经网络进行纵向数据的联合集成和分类。我们将该管道应用于来自炎症性肠病(IBD)研究的横截面和纵向多组学数据(宏基因组学、转录组学和代谢组学),并确定了区分 IBD 状态的微生物途径、代谢物和基因,提供了关于 IBD 病因的信息。我们进行了模拟比较两种特征提取方法。