Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense, 5230, Denmark.
Wisconsin Institute for Discovery, University of Wisconsin-Madison, WI, Madison, USA; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran.
Comput Biol Med. 2024 Dec;183:109238. doi: 10.1016/j.compbiomed.2024.109238. Epub 2024 Oct 19.
In Bioinformatics, inferring the structure of a Gene Regulatory Network (GRN) from incomplete gene expression data is a difficult task. One popular method for inferring the structure GRNs is to apply the Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI). Although PCA-CMI excels at extracting GRN skeletons, it struggles with missing values in datasets. As a result, applying PCA-CMI to infer GRNs, necessitates a preprocessing method for data imputation. In this paper, we present the GAEM algorithm, which uses an iterative approach based on a combination of Genetic Algorithm and Expectation-Maximization to infer the structure of GRN from incomplete gene expression datasets. GAEM learns the GRN structure from the incomplete dataset via an algorithm that iteratively updates the imputed values based on the learnt GRN until the convergence criteria are met. We evaluate the performance of this algorithm under various missingness mechanisms (ignorable and nonignorable) and percentages (5%, 15%, and 40%). The traditional approach to handling missing values in gene expression datasets involves estimating them first and then constructing the GRN. However, our methodology differs in that both missing values and the GRN are updated iteratively until convergence. Results from the DREAM3 dataset demonstrate that the GAEM algorithm appears to be a more reliable method overall, especially for smaller network sizes, GAEM outperforms methods where the incomplete dataset is imputed first, followed by learning the GRN structure from the imputed data. We have implemented the GAEM algorithm within the GAEM R package, which is accessible at the following GitHub repository: https://github.com/parniSDU/GAEM.
在生物信息学中,从不完全的基因表达数据中推断基因调控网络(GRN)的结构是一项具有挑战性的任务。一种流行的推断 GRN 结构的方法是应用基于条件互信息的路径一致性算法(PCA-CMI)。尽管 PCA-CMI 在提取 GRN 骨架方面表现出色,但它在数据集存在缺失值的情况下却存在困难。因此,应用 PCA-CMI 来推断 GRN 需要一种数据插补的预处理方法。在本文中,我们提出了 GAEM 算法,该算法使用基于遗传算法和期望最大化的迭代方法,从不完全的基因表达数据集推断 GRN 的结构。GAEM 通过一种算法从不完全数据集学习 GRN 结构,该算法基于学习到的 GRN 迭代更新插补值,直到满足收敛标准。我们在各种缺失机制(可忽略和不可忽略)和百分比(5%、15%和 40%)下评估了该算法的性能。处理基因表达数据集中缺失值的传统方法涉及首先对其进行估计,然后构建 GRN。然而,我们的方法不同之处在于,缺失值和 GRN 都在迭代中进行更新,直到收敛。来自 DREAM3 数据集的结果表明,GAEM 算法总体上似乎是一种更可靠的方法,特别是对于较小的网络规模,GAEM 优于首先对不完全数据集进行插补,然后从插补数据中学习 GRN 结构的方法。我们已经在 GAEM R 包中实现了 GAEM 算法,该包可在以下 GitHub 存储库中获得:https://github.com/parniSDU/GAEM。