Suppr超能文献

CANTARE:基于网络的多组学预测模型的发现和可视化。

CANTARE: finding and visualizing network-based multi-omic predictive models.

机构信息

Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

出版信息

BMC Bioinformatics. 2021 Feb 19;22(1):80. doi: 10.1186/s12859-021-04016-8.

Abstract

BACKGROUND

One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be "ome aware." Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs.

METHODS

We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting "top table" of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements).

RESULTS

We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and  penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User's Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/ .

CONCLUSION

CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.

摘要

背景

多组学研究的目标之一是识别出有意义的结果的可解释预测模型,这些模型的分析物来自多个组学。这些发现可以支持更精细的生物学见解和假设生成。然而,标准的分析方法并不是为“组学感知”而设计的。因此,一些研究人员一次分析一个组学的数据,然后再对组学之间的预测进行组合。另一些研究人员则求助于相关性研究,对两两关系进行编目,但缺乏一种用于对这些编目进行有凝聚力和可解释的总结的明显方法。

方法

我们提出了一种从多组学网络中的网络邻域构建预测回归模型的新工作流程。首先,我们在所有组学的所有分析物对之间生成两两回归模型,将所得的“顶级表”关系编码在网络中。然后,我们使用感兴趣的网络邻域中的分析物构建预测逻辑回归模型。我们将这种方法称为 CANTARE(网络拓扑和回归元素的综合分析)。

结果

我们将 CANTARE 应用于先前发表的健康对照者和炎症性肠病(IBD)患者的多组学数据,该数据包含三个组学:肠道微生物组、代谢组学和微生物衍生的酶。我们鉴定了 8 个具有 AUC>0.90 的独特预测模型。这些模型中的预测因子数量从 3 到 13 不等。我们将 CANTARE 的结果与随机森林和弹性网络惩罚回归进行比较,分析 AUC、预测值和预测因子。CANTARE 的 AUC 值与随机森林和惩罚回归生成的值具有竞争力。CANTARE 前 3 个模型的预测概率的动态范围大于随机森林和惩罚回归(p 值=1.35×10)。CANTARE 模型比替代方法更有可能从多个组学中优先选择预测因子(p 值=0.005)。我们还表明,基于包含 IBD 交互项的两两模型的网络构建的预测模型的 AUC 高于基于相关网络构建的预测模型(p 值=0.016)。在 https://sourceforge.net/projects/cytomelodics/files/CANTARE/ 可以找到 R 脚本和 CANTARE 用户指南。

结论

CANTARE 提供了一种灵活的方法,可以构建简洁、可解释的多组学模型。这些模型为预测因子提供了定量和有方向的效应大小,并支持生成后续研究的假设。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ff9/7896366/d1b84b892dee/12859_2021_4016_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验