Wani Nisar, Raza Khalid
Govt. Degree College Baramulla, Jammu & Kashmir, India.
Department of Computer Science, Jamia Millia Islamia, New Delhi, India.
PeerJ Comput Sci. 2021 Jan 28;7:e363. doi: 10.7717/peerj-cs.363. eCollection 2021.
High throughput multi-omics data generation coupled with heterogeneous genomic data fusion are defining new ways to build computational inference models. These models are scalable and can support very large genome sizes with the added advantage of exploiting additional biological knowledge from the integration framework. However, the limitation with such an arrangement is the huge computational cost involved when learning from very large datasets in a sequential execution environment. To overcome this issue, we present a multiple kernel learning (MKL) based gene regulatory network (GRN) inference approach wherein multiple heterogeneous datasets are fused using MKL paradigm. We formulate the GRN learning problem as a supervised classification problem, whereby genes regulated by a specific transcription factor are separated from other non-regulated genes. A parallel execution architecture is devised to learn a large scale GRN by decomposing the initial classification problem into a number of subproblems that run as multiple processes on a multi-processor machine. We evaluate the approach in terms of increased speedup and inference potential using genomic data from , and . The results thus obtained demonstrate that the proposed method exhibits better classification accuracy and enhanced speedup compared to other state-of-the-art methods while learning large scale GRNs from multiple and heterogeneous datasets.
高通量多组学数据生成与异构基因组数据融合相结合,正在定义构建计算推理模型的新方法。这些模型具有可扩展性,能够支持非常大的基因组规模,并且具有从整合框架中利用额外生物学知识的优势。然而,这种安排的局限性在于,在顺序执行环境中从非常大的数据集进行学习时,涉及巨大的计算成本。为了克服这个问题,我们提出了一种基于多核学习(MKL)的基因调控网络(GRN)推理方法,其中使用MKL范式融合多个异构数据集。我们将GRN学习问题表述为一个监督分类问题,即把由特定转录因子调控的基因与其他未调控的基因区分开来。设计了一种并行执行架构,通过将初始分类问题分解为多个子问题,这些子问题在多处理器机器上作为多个进程运行,从而学习大规模GRN。我们使用来自[具体数据集1]、[具体数据集2]和[具体数据集3]的基因组数据,在加速比提升和推理潜力方面评估该方法。由此获得的结果表明,与其他现有方法相比,在从多个异构数据集学习大规模GRN时,所提出的方法具有更好的分类准确率和更高的加速比。