Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain.
Comput Biol Med. 2024 Sep;179:108850. doi: 10.1016/j.compbiomed.2024.108850. Epub 2024 Jul 15.
Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand.
MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance.
MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner.
MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
基因调控网络(GRN)推断是生物学和医学中的一项基本任务,因为它使我们能够更深入地了解生物体内基因表达的复杂机制。这个生物信息学问题已经在文献中通过多种计算方法得到了解决。从表达数据中推断的技术已经采用了贝叶斯网络、常微分方程(ODE)、机器学习、信息论度量和神经网络等。实现的多样性及其各自的定制化导致了许多工具的出现,并从中衍生出多个专门领域,这些领域被理解为具有特定特征的网络子集,这些特征很难事先检测到。这种专业化在为特定数据集选择最合适的技术时引入了很大的不确定性。这个名为 MO-GENECI 的提案是在之前的 GENECI 提案的基本思想基础上提出的,通过一个精心设计的多目标进化算法来优化不同推断技术之间的共识,该算法由各种与当前生物学背景相关的目标函数指导。
MO-GENECI 已经在来自多个来源和大小的 106 个基因调控网络的广泛而多样的学术基准上进行了测试。使用关键指标(AUROC 和 AUPR)对基因调控网络推断进行评估,比较了 MO-GENECI 与单个技术的性能。弗里德曼的统计排名提供了一个有序的分类,然后使用非参数霍尔姆斯检验来确定统计学意义。
MO-GENECI 的 Pareto 前沿逼近使得根据通用输入数据特征轻松选择合适的解决方案成为可能。在所有的统计测试中,最优解始终是赢家,而且在许多情况下,中位数精度解与赢家相比没有统计学上的显著差异。
MO-GENECI 不仅证明了比单个技术获得更准确的结果,而且还克服了由于其灵活性和适应性而导致的初始选择的不确定性。它被证明能够智能地选择最适合每个案例的技术。源代码托管在 GitHub 上的一个公共存储库中,采用 MIT 许可证:https://github.com/AdrianSeguraOrtiz/MO-GENECI。此外,为了方便安装和使用,与该实现相关的软件已经封装在一个可在 PyPI 上获得的 Python 包中:https://pypi.org/project/geneci/。