Suppr超能文献

基于多队列多组学生物学数据的可解释神经网络进行表型预测。

Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data.

机构信息

Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands.

Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands.

出版信息

NPJ Syst Biol Appl. 2024 Aug 2;10(1):81. doi: 10.1038/s41540-024-00405-w.

Abstract

Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, N = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.

摘要

将多组学数据整合到预测模型中有可能提高准确性,这对于精准医学至关重要。在这项研究中,我们通过使用受先前生物学知识启发的神经网络(称为可见网络)开发了多组学数据的可解释预测模型。这些神经网络可以深入了解决策过程,并揭示与特征和复杂疾病相关的潜在生物学机制的新视角。我们使用 BIOS 联盟(四个人群队列,N=2940)的血液中的全基因组 RNA 表达和 CpG 甲基化数据,测试了推断吸烟状态、受试者年龄和 LDL 水平的性能、可解释性和泛化能力。在队列间交叉验证设置中,评估了诊断性能和解释的一致性。使用全基因组 RNA 表达和 CpG 甲基化数据推断吸烟状态的性能始终很高,总体平均 AUC 为 0.95(95%CI:0.90-1.00),解释揭示了 AHRR、GPR15 和 LRRN3 等经过充分复制的基因的参与。LDL 水平的预测仅在一个队列中具有一般性,R 为 0.07(95%CI:0.05-0.08)。年龄的推断平均误差为 5.16(95%CI:3.97-6.35),COL11A2、AFAP1、OTUD7A、PTPRN2、ADARB2 和 CD34 等基因始终具有预测性。对于这两个回归任务,我们发现与可解释的单组学网络相比,使用多组学网络可提高性能、稳定性和泛化能力。我们相信可见神经网络在多组学分析中具有很大的潜力;它们优雅地整合了多组学数据,具有可解释性,并且可以很好地推广到来自不同队列的数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8ed/11297229/ec7c9e701dde/41540_2024_405_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验