Suppr超能文献

多代谢网络的保形新颖性检测。

Conformal novelty detection for multiple metabolic networks.

机构信息

LPSM, Sorbonne university, 4 place Jussieu, 75005, Paris, France.

MaIAGE, INRAE, Domaine de Vilvert, 78350, Jouy-en-Josas, France.

出版信息

BMC Bioinformatics. 2024 Nov 16;25(1):358. doi: 10.1186/s12859-024-05971-8.

Abstract

BACKGROUND

Graphical representations are useful to model complex data in general and biological interactions in particular. Our main motivation is the comparison of metabolic networks in the wider context of developing noninvasive accurate diagnostic tools. However, comparison and classification of graphs is still extremely challenging, although a number of highly efficient methods such as graph neural networks were developed in the recent decade. Important aspects are still lacking in graph classification: interpretability and guarantees on classification quality, i.e., control of the risk level or false discovery rate control.

RESULTS

In our contribution, we introduce a statistically sound approach to control the false discovery rate in a classification task for graphs in a semi-supervised setting. Our procedure identifies novelties in a dataset, where a graph is considered to be a novelty when its topology is significantly different from those in the reference class. It is noteworthy that the procedure is a conformal prediction approach, which does not make any distributional assumptions on the data and that can be seen as a wrapper around traditional machine learning models, so that it takes full advantage of existing methods. The performance of the proposed method is assessed on several standard benchmarks. It is also adapted and applied to the difficult task of classifying metabolic networks, where each graph is a representation of all metabolic reactions of a bacterium and to real task from a cancer data repository.

CONCLUSIONS

Our approach efficiently controls - in highly complex data - the false discovery rate, while maximizing the true discovery rate to get the most reasonable predictive performance. This contribution is focused on confident classification of complex data, what can be further used to explore complex human pathologies and their mechanisms.

摘要

背景

图形表示通常可用于建模复杂数据,特别是生物相互作用。我们的主要动机是在开发非侵入性准确诊断工具的更广泛背景下比较代谢网络。然而,尽管在最近十年中开发了许多高效的方法,如图神经网络,但图的比较和分类仍然极具挑战性。图分类仍然缺乏一些重要方面:可解释性和分类质量保证,即风险水平控制或错误发现率控制。

结果

在我们的贡献中,我们引入了一种在半监督设置中控制图分类任务中错误发现率的统计上合理的方法。我们的程序识别数据集的新颖性,当图的拓扑结构与其参考类中的拓扑结构明显不同时,该图被认为是新颖的。值得注意的是,该程序是一种符合预测方法,对数据没有任何分布假设,可以看作是传统机器学习模型的包装器,从而充分利用现有的方法。所提出的方法的性能在几个标准基准上进行了评估。它还被改编并应用于代谢网络的分类这一困难任务,其中每个图代表细菌的所有代谢反应,以及来自癌症数据存储库的真实任务。

结论

我们的方法在高度复杂的数据中有效地控制错误发现率,同时最大化真发现率以获得最合理的预测性能。本研究重点关注复杂数据的置信分类,这可以进一步用于探索复杂的人类病理及其机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f347/11569617/efea08e5fa5b/12859_2024_5971_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验