Department of Statistics, University of Rome La Sapienza, Rome, Italy.
Department of Mathematics and Computer Science, University of Palermo, Palermo, Italy.
BMC Bioinformatics. 2022 Nov 11;23(1):474. doi: 10.1186/s12859-022-05026-w.
Huge amounts of molecular interaction data are continuously produced and stored in public databases. Although many bioinformatics tools have been proposed in the literature for their analysis, based on their modeling through different types of biological networks, several problems still remain unsolved when the problem turns on a large scale.
We propose DIAMIN, that is, a high-level software library to facilitate the development of applications for the efficient analysis of large-scale molecular interaction networks. DIAMIN relies on distributed computing, and it is implemented in Java upon the framework Apache Spark. It delivers a set of functionalities implementing different tasks on an abstract representation of very large graphs, providing a built-in support for methods and algorithms commonly used to analyze these networks. DIAMIN has been tested on data retrieved from two of the most used molecular interactions databases, resulting to be highly efficient and scalable. As shown by different provided examples, DIAMIN can be exploited by users without any distributed programming experience, in order to perform various types of data analysis, and to implement new algorithms based on its primitives.
The proposed DIAMIN has been proved to be successful in allowing users to solve specific biological problems that can be modeled relying on biological networks, by using its functionalities. The software is freely available and this will hopefully allow its rapid diffusion through the scientific community, to solve both specific data analysis and more complex tasks.
大量的分子相互作用数据不断地在公共数据库中产生和存储。尽管文献中已经提出了许多生物信息学工具来分析这些数据,但当问题涉及大规模数据时,基于不同类型的生物网络进行建模,仍然存在一些未解决的问题。
我们提出了 DIAMIN,这是一个高级软件库,旨在为大规模分子相互作用网络的高效分析应用程序的开发提供便利。DIAMIN 依赖于分布式计算,它是在 Java 上基于 Apache Spark 框架实现的。它提供了一组功能,在大型图的抽象表示上实现不同的任务,为分析这些网络常用的方法和算法提供了内置支持。DIAMIN 已经在从两个最常用的分子相互作用数据库中检索的数据上进行了测试,结果表明它具有高效性和可扩展性。通过提供的不同示例可以看出,即使用户没有分布式编程经验,也可以利用 DIAMIN 来执行各种类型的数据分析,并基于其原语实现新的算法。
已经证明,所提出的 DIAMIN 成功地允许用户通过使用其功能来解决可以通过生物网络建模的特定生物学问题。该软件是免费提供的,这有望通过科学界的快速传播,来解决特定的数据分析和更复杂的任务。