• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DIAMIN:用于大规模分子相互作用网络分布式分析的软件库。

DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks.

机构信息

Department of Statistics, University of Rome La Sapienza, Rome, Italy.

Department of Mathematics and Computer Science, University of Palermo, Palermo, Italy.

出版信息

BMC Bioinformatics. 2022 Nov 11;23(1):474. doi: 10.1186/s12859-022-05026-w.

DOI:10.1186/s12859-022-05026-w
PMID:36368948
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9652854/
Abstract

BACKGROUND

Huge amounts of molecular interaction data are continuously produced and stored in public databases. Although many bioinformatics tools have been proposed in the literature for their analysis, based on their modeling through different types of biological networks, several problems still remain unsolved when the problem turns on a large scale.

RESULTS

We propose DIAMIN, that is, a high-level software library to facilitate the development of applications for the efficient analysis of large-scale molecular interaction networks. DIAMIN relies on distributed computing, and it is implemented in Java upon the framework Apache Spark. It delivers a set of functionalities implementing different tasks on an abstract representation of very large graphs, providing a built-in support for methods and algorithms commonly used to analyze these networks. DIAMIN has been tested on data retrieved from two of the most used molecular interactions databases, resulting to be highly efficient and scalable. As shown by different provided examples, DIAMIN can be exploited by users without any distributed programming experience, in order to perform various types of data analysis, and to implement new algorithms based on its primitives.

CONCLUSIONS

The proposed DIAMIN has been proved to be successful in allowing users to solve specific biological problems that can be modeled relying on biological networks, by using its functionalities. The software is freely available and this will hopefully allow its rapid diffusion through the scientific community, to solve both specific data analysis and more complex tasks.

摘要

背景

大量的分子相互作用数据不断地在公共数据库中产生和存储。尽管文献中已经提出了许多生物信息学工具来分析这些数据,但当问题涉及大规模数据时,基于不同类型的生物网络进行建模,仍然存在一些未解决的问题。

结果

我们提出了 DIAMIN,这是一个高级软件库,旨在为大规模分子相互作用网络的高效分析应用程序的开发提供便利。DIAMIN 依赖于分布式计算,它是在 Java 上基于 Apache Spark 框架实现的。它提供了一组功能,在大型图的抽象表示上实现不同的任务,为分析这些网络常用的方法和算法提供了内置支持。DIAMIN 已经在从两个最常用的分子相互作用数据库中检索的数据上进行了测试,结果表明它具有高效性和可扩展性。通过提供的不同示例可以看出,即使用户没有分布式编程经验,也可以利用 DIAMIN 来执行各种类型的数据分析,并基于其原语实现新的算法。

结论

已经证明,所提出的 DIAMIN 成功地允许用户通过使用其功能来解决可以通过生物网络建模的特定生物学问题。该软件是免费提供的,这有望通过科学界的快速传播,来解决特定的数据分析和更复杂的任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/49dfd5f6c464/12859_2022_5026_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/8ce701e6904e/12859_2022_5026_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/7a75bb28c4e3/12859_2022_5026_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/022581352369/12859_2022_5026_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/49dfd5f6c464/12859_2022_5026_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/8ce701e6904e/12859_2022_5026_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/7a75bb28c4e3/12859_2022_5026_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/022581352369/12859_2022_5026_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545f/9652854/49dfd5f6c464/12859_2022_5026_Fig4_HTML.jpg

相似文献

1
DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks.DIAMIN:用于大规模分子相互作用网络分布式分析的软件库。
BMC Bioinformatics. 2022 Nov 11;23(1):474. doi: 10.1186/s12859-022-05026-w.
2
MaRe: Processing Big Data with application containers on Apache Spark.MaRe:在 Apache Spark 上使用应用程序容器处理大数据。
Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa042.
3
A distributed computing model for big data anonymization in the networks.一种用于网络大数据匿名化的分布式计算模型。
PLoS One. 2023 Apr 28;18(4):e0285212. doi: 10.1371/journal.pone.0285212. eCollection 2023.
4
The portable UNIX programming system (PUPS) and CANTOR: a computational environment for dynamical representation and analysis of complex neurobiological data.便携式UNIX编程系统(PUPS)和康托尔:一个用于复杂神经生物学数据动态表示与分析的计算环境。
Philos Trans R Soc Lond B Biol Sci. 2001 Aug 29;356(1412):1259-76. doi: 10.1098/rstb.2001.0912.
5
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.CloudDOE:一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。
PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.
6
Visualization of protein interaction networks: problems and solutions.蛋白质相互作用网络的可视化:问题与解决方案。
BMC Bioinformatics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-14-S1-S1. Epub 2013 Jan 14.
7
GLAD: a system for developing and deploying large-scale bioinformatics grid.GLAD:一个用于开发和部署大规模生物信息学网格的系统。
Bioinformatics. 2005 Mar;21(6):794-802. doi: 10.1093/bioinformatics/bti034. Epub 2004 Sep 23.
8
Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.用于非并行生物信息学应用程序的并行工作流管理器,以在超级计算机上解决大规模生物学问题。
J Bioinform Comput Biol. 2016 Apr;14(2):1641008. doi: 10.1142/S0219720016410080.
9
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.面向高性能计算的生物信息学应用学习算法并行实现
BMC Bioinformatics. 2014;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-15-S5-S2. Epub 2014 May 6.
10
Bioinformatics applications on Apache Spark.基于 Apache Spark 的生物信息学应用。
Gigascience. 2018 Aug 1;7(8):giy098. doi: 10.1093/gigascience/giy098.

引用本文的文献

1
A scalable distributed pipeline for reference-free variants calling.一种用于无参考变异检测的可扩展分布式流程。
BMC Genomics. 2025 Jun 3;26(Suppl 1):557. doi: 10.1186/s12864-025-11722-7.

本文引用的文献

1
Network medicine framework shows that proximity of polyphenol targets and disease proteins predicts therapeutic effects of polyphenols.网络医学框架表明,多酚靶点与疾病蛋白的接近程度可预测多酚的治疗效果。
Nat Food. 2021 Mar;2(3):143-155. doi: 10.1038/s43016-021-00243-7. Epub 2021 Mar 19.
2
Topological ranks reveal functional knowledge encoded in biological networks: a comparative analysis.拓扑秩揭示了生物网络中编码的功能知识:比较分析。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac101.
3
Clinical epigenetics settings for cancer and cardiovascular diseases: real-life applications of network medicine at the bedside.
临床表观遗传学在癌症和心血管疾病中的应用:网络医学在床边的实际应用。
Clin Epigenetics. 2021 Mar 30;13(1):66. doi: 10.1186/s13148-021-01047-z.
4
DPCMNE: Detecting Protein Complexes From Protein-Protein Interaction Networks Via Multi-Level Network Embedding.DPCMNE:通过多层次网络嵌入从蛋白质-蛋白质相互作用网络中检测蛋白质复合物。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1592-1602. doi: 10.1109/TCBB.2021.3050102. Epub 2022 Jun 3.
5
The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.2021 年的 STRING 数据库:可定制的蛋白质-蛋白质网络,以及用户上传的基因/测量集的功能特征分析。
Nucleic Acids Res. 2021 Jan 8;49(D1):D605-D612. doi: 10.1093/nar/gkaa1074.
6
A validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases.一种经过验证的基于单细胞的策略,用于鉴定复杂疾病中的诊断和治疗靶点。
Genome Med. 2019 Jul 30;11(1):47. doi: 10.1186/s13073-019-0657-3.
7
IID 2018 update: context-specific physical protein-protein interactions in human, model organisms and domesticated species.IID 2018 更新:人类、模式生物和驯养物种中特定于上下文的物理蛋白质-蛋白质相互作用。
Nucleic Acids Res. 2019 Jan 8;47(D1):D581-D589. doi: 10.1093/nar/gky1037.
8
A systematic survey of centrality measures for protein-protein interaction networks.蛋白质-蛋白质相互作用网络中心性度量的系统综述。
BMC Syst Biol. 2018 Jul 31;12(1):80. doi: 10.1186/s12918-018-0598-2.
9
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.HipMCL:一种用于大规模网络的马尔可夫聚类算法的高性能并行实现。
Nucleic Acids Res. 2018 Apr 6;46(6):e33. doi: 10.1093/nar/gkx1313.
10
MicroRNAs and complex diseases: from experimental results to computational models.微小 RNA 与复杂疾病:从实验结果到计算模型。
Brief Bioinform. 2019 Mar 22;20(2):515-539. doi: 10.1093/bib/bbx130.