Suppr超能文献

CausNet:基于世代排序的搜索,通过带有父集约束的动态规划寻找最优贝叶斯网络。

CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints.

机构信息

Division of Biostatistics, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, USA.

出版信息

BMC Bioinformatics. 2023 Feb 14;24(1):46. doi: 10.1186/s12859-023-05159-6.

Abstract

BACKGROUND

Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that can feasibly be included. We implement a dynamic programming based algorithm with built-in dimensionality reduction and parent set identification. This reduces the search space substantially and can be applied to large-dimensional data. We use what we call 'generational orderings' based search for optimal networks, which is a novel way to efficiently search the space of possible networks given the possible parent sets. The algorithm supports both continuous and categorical data, as well as continuous, binary and survival outcomes.

RESULTS

We demonstrate the efficacy of our algorithm on both synthetic and real data. In simulations, our algorithm performs better than three state-of-art algorithms that are currently used extensively. We then apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find an optimal network describing the disease pathway consisting of 6 genes leading to the outcome node in just 3.4 min on a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM.

CONCLUSIONS

Our generational orderings based search for optimal networks is both an efficient and highly scalable approach for finding optimal Bayesian Networks and can be applied to 1000 s of variables. Using specifiable parameters-correlation, FDR cutoffs, and in-degree-one can increase or decrease the number of nodes and density of the networks. Availability of two scoring option-BIC and Bge-and implementation for survival outcomes and mixed data types makes our algorithm very suitable for many types of high dimensional data in a variety of fields.

摘要

背景

使用穷举搜索找到全局最优贝叶斯网络是一个具有超指数复杂度的问题,这严重限制了可以实际包含的变量数量。我们实现了一种基于动态规划的算法,具有内置的降维和父集识别功能。这大大减少了搜索空间,可以应用于高维数据。我们使用我们所谓的“生成顺序”搜索最优网络,这是一种在给定可能的父集的情况下有效地搜索可能网络空间的新方法。该算法支持连续和分类数据,以及连续、二值和生存结局。

结果

我们在合成数据和真实数据上展示了我们算法的有效性。在模拟中,我们的算法比目前广泛使用的三种最先进的算法表现更好。然后,我们将其应用于一个包含 513 个基因和一个生存结局的卵巢癌基因表达数据集。我们的算法能够在个人计算机上仅用 3.4 分钟找到一个描述疾病途径的最优网络,该网络由 6 个基因组成,导致结局节点,计算机的处理器为 2.3GHz Intel Core i9,内存为 16GB。

结论

我们的基于生成顺序的最优网络搜索是一种高效且高度可扩展的方法,可用于寻找最优贝叶斯网络,并且可以应用于数千个变量。使用可指定的参数-相关性、FDR 截止值和入度一,可以增加或减少节点的数量和网络的密度。提供两种评分选项-BIC 和 Bge-以及对生存结局和混合数据类型的实现,使得我们的算法非常适合各种领域的多种高维数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b16/9926787/702ac5a9a5f2/12859_2023_5159_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验