Health Innovation and Transformation Centre, Federation University, Victoria, Australia.
Health Innovation and Transformation Centre, Federation University, Victoria, Australia.
Biosystems. 2022 Nov;221:104757. doi: 10.1016/j.biosystems.2022.104757. Epub 2022 Aug 22.
The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.
从时间序列基因表达数据中重建基因调控网络(GRNs)对于发现复杂的生物相互作用和动态非常重要。为此任务已经开发了各种计算策略,但大多数方法计算效率低,并且无法处理高维、低样本数、基因表达数据。在本文中,我们介绍了一种新颖的组合滤波器特征选择方法,用于高效准确地推断 GRNs。使用布尔网络建模框架来证明所提出方法的有效性。使用离散化的微阵列表达数据,首先使用 ReliefF(一种基于实例的特征排序方法,这是首次应用于 GRN 推断)对与每个目标基因最相关的基因进行过滤。然后,通过消除不相关的基因,使用基于互信息的最小冗余最大相关性准则从过滤后的基因列表中进一步进行基因选择。该组合方法在重新采样数据集上执行,以确定最佳的调控基因集。在我们之前的研究基础上,利用基于 Pearson 相关系数的布尔建模方法来有效地识别与所选调控基因相关的最佳调控规则。使用从小规模和中等规模真实基因网络获得的基因表达数据集评估了所提出的方法,结果表明它比线性判别分析更有效,比单个特征选择方法表现更好,并且与其他最先进的方法相比,结构准确性更高,具有更高数量的真阳性,同时在动态准确性和效率方面优于这些方法。