Suppr超能文献

二分网络比简单网络更能体现因果关系:证据、算法及应用。

Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications.

作者信息

Shen Bingran, Curozzi Gloria, Shasha Dennis

机构信息

Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, United States.

Center for Genomics and Systems Biology, Department of Biology, New York University, New York, United States.

出版信息

Front Genet. 2024 May 9;15:1371607. doi: 10.3389/fgene.2024.1371607. eCollection 2024.

Abstract

A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified "gold standard" edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on "gold standard" regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene ; (iii) the identification of disjoint sets of predictive regulatory genes for each target of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.

摘要

一种网络,其节点为基因,其有向边表示调控基因及其靶标的正向或负向影响,常被用作因果关系的一种表示形式。为了推断一个网络,研究人员通常会开发一个机器学习模型,然后根据其与经过实验验证的“金标准”边的匹配程度来评估该模型。这种模型的期望结果是一个可能扩展金标准边的网络。由于网络是一种视觉表示形式,人们可以将它们的效用与建筑蓝图或机器蓝图进行比较。蓝图显然很有用,因为它们在建筑施工中为建造者提供了精确的指导。如果基因调控网络的主要作用是表征因果关系,那么这样的网络应该是很好的预测工具,因为预测是了解因果关系可带来的实际益处。但它们是吗?在本文中,我们将基于先前实验工作中的“金标准”调控边的预测质量与从四个不同物种的时间序列数据推断出的非线性模型进行比较。我们表明,相同的非线性机器学习模型具有更好的预测性能,与基于金标准边的相同模型相比,均方根误差(RMSE)降低了5.3%至25.3%。在确定网络未能正确表征因果关系后,我们建议因果关系研究应聚焦于四个目标:(i)预测准确性;(ii)为每个靶基因简洁地列举预测性调控基因;(iii)为每个靶标识别出预测准确性大致相等的不相交的预测性调控基因集;(iv)构建因果关系的二分网络(其节点类型为基因和模型)表示形式。我们为所有目标提供了算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2921/11120958/2a2b45c2e780/fgene-15-1371607-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验