一种基于 GPU 的算法，用于快速学习大型不平衡生物分子网络中的节点标签。

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

机构信息

AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135, Italy.

Department of Dermatology, Fondazione IRCCS Ca' Granda,, Ospedale Maggiore Policlinico, Milan, 20122, Italy.

出版信息

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):353. doi: 10.1186/s12859-018-2301-4.

DOI:10.1186/s12859-018-2301-4

PMID:30367594

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6191976/

Abstract

BACKGROUND

Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks.

RESULTS

We propose a novel semi-supervised parallel enhancement of COSNET, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method.

CONCLUSIONS

By parallelizing COSNET we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.

摘要

背景

网络生物学和医学中的几个问题可以被转化为一个框架，其中实体通过部分标记的网络来表示，目标是推断未标记部分的标签（通常是二进制的）。连接表示实体之间的功能或遗传相似性，而标签通常高度不平衡，即一类的代表性严重不足：例如，在自动蛋白质功能预测（AFP）中，大多数基因本体术语只有少数蛋白质被注释，或者在疾病-基因优先级问题中，实际上只有少数基因被认为与给定疾病的病因有关。因此，需要使用对不平衡有感知的方法来准确预测生物网络中的节点标签。此外，由于输入数据可能很大，例如在多物种蛋白质网络的情况下，此类方法必须具有可扩展性。

结果

我们提出了一种新颖的半监督并行增强方法，该方法基于 Hopfield 神经网络模型，最近被提议用于解决 AFP 问题。通过采用图形的有效表示和假设稀疏网络拓扑结构，我们从经验上证明它可以有效地应用于具有数百万个节点的网络。加速计算的关键策略是将节点划分为独立集，以便通过利用 GPU 加速器的功能并行处理每个集。这种并行技术确保收敛到渐近稳定的吸引子，同时保留原始模型的异步动力学。对真实数据和问题的人工大实例的详细实验突出了所提出方法的可扩展性和效率。

结论

通过并行化 COSNET，我们在解决 S. cerevisiae、Mus musculus 和 Homo sapiens 生物体中的 AFP 问题方面平均实现了 180 倍的加速，同时降低了内存需求。此外，为了展示该方法在巨大生物分子网络中的潜在适用性，我们预测了涉及数十万到数百万个节点的人工生成稀疏网络中的节点标签。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/6191976/925a707f3d7c/12859_2018_2301_Fig1_HTML.jpg

相似文献

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.一种基于 GPU 的算法，用于快速学习大型不平衡生物分子网络中的节点标签。

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):353. doi: 10.1186/s12859-018-2301-4.

A neural network algorithm for semi-supervised node label learning from unbalanced data.一种从不平衡数据中进行半监督节点标签学习的神经网络算法。

Neural Netw. 2013 Jul;43:84-98. doi: 10.1016/j.neunet.2013.01.021. Epub 2013 Feb 6.

Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction.全球化思考，本地化解决：基于二级存储的网络学习，用于自动化多物种功能预测。

Gigascience. 2014 Apr 23;3:5. doi: 10.1186/2047-217X-3-5. eCollection 2014.

Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。

Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.

Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction.基于稀疏马尔可夫链的半监督多示例多标签蛋白质功能预测方法。

J Bioinform Comput Biol. 2015 Oct;13(5):1543001. doi: 10.1142/S0219720015430015. Epub 2015 Sep 16.

Parallel beamlet dose calculation via beamlet contexts in a distributed multi-GPU framework.基于分布式多 GPU 框架中的束流子区域进行平行束流子剂量计算。

Med Phys. 2019 Aug;46(8):3719-3733. doi: 10.1002/mp.13651. Epub 2019 Jun 30.

An efficient algorithm to integrate network and attribute data for gene function prediction.一种整合网络和属性数据以进行基因功能预测的高效算法。

Pac Symp Biocomput. 2014:388-99.

Evaluating the impact of topological protein features on the negative examples selection.评估拓扑蛋白特征对负例选择的影响。

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):417. doi: 10.1186/s12859-018-2385-x.

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。

BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.

An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks.一种基于不确定生物网络中电路仿真方法的新型频繁概率模式挖掘算法。

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S6. doi: 10.1186/1752-0509-8-S3-S6. Epub 2014 Oct 22.

引用本文的文献

Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇：跨领域的系统评价与生化荟萃分析

Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.

BITS 2017: the annual meeting of the Italian Society of Bioinformatics.BITS 2017：意大利生物信息学会年会。

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):352. doi: 10.1186/s12859-018-2295-y.

本文引用的文献

Multitask Protein Function Prediction through Task Dissimilarity.通过任务差异进行多任务蛋白质功能预测。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1550-1560. doi: 10.1109/TCBB.2017.2684127. Epub 2017 Mar 17.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.对蛋白质功能预测方法的扩展评估显示准确性有所提高。

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

RANKS: a flexible tool for node label ranking and classification in biological networks.RANKS：一种用于生物网络中节点标签排序和分类的灵活工具。

Bioinformatics. 2016 Sep 15;32(18):2872-4. doi: 10.1093/bioinformatics/btw235. Epub 2016 Jun 2.

UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.UniProtKB/Swiss-Prot，即UniProt知识库的人工注释部分：如何使用条目视图。

Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2.

UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions.UNIPred：蛋白质功能的不平衡感知网络整合与预测

J Comput Biol. 2015 Dec;22(12):1057-74. doi: 10.1089/cmb.2014.0110. Epub 2015 Sep 24.

STRING v10: protein-protein interaction networks, integrated over the tree of life.STRING v10：整合了整个生命之树的蛋白质-蛋白质相互作用网络。

Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.

Gigascience. 2014 Apr 23;3:5. doi: 10.1186/2047-217X-3-5. eCollection 2014.

A neural network algorithm for semi-supervised node label learning from unbalanced data.一种从不平衡数据中进行半监督节点标签学习的神经网络算法。

Neural Netw. 2013 Jul;43:84-98. doi: 10.1016/j.neunet.2013.01.021. Epub 2013 Feb 6.

Molecular function prediction using neighborhood features.基于邻域特征的分子功能预测。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):208-17. doi: 10.1109/TCBB.2009.81.

GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function.GeneMANIA：一种用于预测基因功能的实时多重关联网络整合算法。

Genome Biol. 2008;9 Suppl 1(Suppl 1):S4. doi: 10.1186/gb-2008-9-s1-s4. Epub 2008 Jun 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于 GPU 的算法，用于快速学习大型不平衡生物分子网络中的节点标签。

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献