Suppr超能文献

解决网络推理中的错误发现问题。

Addressing false discoveries in network inference.

作者信息

Petri Tobias, Altmann Stefan, Geistlinger Ludwig, Zimmer Ralf, Küffner Robert

机构信息

Ludwig-Maximilians-Universität München, Institut für Informatik, Munich, Germany and.

Ludwig-Maximilians-Universität München, Institut für Informatik, Munich, Germany and Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.

出版信息

Bioinformatics. 2015 Sep 1;31(17):2836-43. doi: 10.1093/bioinformatics/btv215. Epub 2015 Apr 24.

Abstract

MOTIVATION

Experimentally determined gene regulatory networks can be enriched by computational inference from high-throughput expression profiles. However, the prediction of regulatory interactions is severely impaired by indirect and spurious effects, particularly for eukaryotes. Recently, published methods report improved predictions by exploiting the a priori known targets of a regulator (its local topology) in addition to expression profiles.

RESULTS

We find that methods exploiting known targets show an unexpectedly high rate of false discoveries. This leads to inflated performance estimates and the prediction of an excessive number of new interactions for regulators with many known targets. These issues are hidden from common evaluation and cross-validation setups, which is due to Simpson's paradox. We suggest a confidence score recalibration method (CoRe) that reduces the false discovery rate and enables a reliable performance estimation.

CONCLUSIONS

CoRe considerably improves the results of network inference methods that exploit known targets. Predictions then display the biological process specificity of regulators more correctly and enable the inference of accurate genome-wide regulatory networks in eukaryotes. For yeast, we propose a network with more than 22 000 confident interactions. We point out that machine learning approaches outside of the area of network inference may be affected as well.

AVAILABILITY AND IMPLEMENTATION

Results, executable code and networks are available via our website http://www.bio.ifi.lmu.de/forschung/CoRe.

CONTACT

robert.kueffner@helmholtz-muenchen.de

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

通过高通量表达谱的计算推断可以丰富实验确定的基因调控网络。然而,调控相互作用的预测受到间接和虚假效应的严重影响,尤其是对于真核生物。最近,已发表的方法报告称,除了表达谱之外,通过利用调节因子的先验已知靶点(其局部拓扑结构),预测得到了改进。

结果

我们发现,利用已知靶点的方法显示出意外高的错误发现率。这导致性能估计虚高,并为具有许多已知靶点的调节因子预测了过多的新相互作用。由于辛普森悖论,这些问题在常见的评估和交叉验证设置中被隐藏了起来。我们提出了一种置信度评分重新校准方法(CoRe),该方法可降低错误发现率,并实现可靠的性能估计。

结论

CoRe显著改善了利用已知靶点的网络推断方法的结果。这样的预测能够更准确地展示调节因子的生物学过程特异性,并能够推断出真核生物中准确的全基因组调控网络。对于酵母,我们提出了一个具有超过22000个可信相互作用的网络。我们指出,网络推断领域之外的机器学习方法也可能受到影响。

可用性和实现方式

结果、可执行代码和网络可通过我们的网站http://www.bio.ifi.lmu.de/forschung/CoRe获得。

联系方式

robert.kueffner@helmholtz-muenchen.de

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验