Suppr超能文献

一种基于 GPU 的算法,用于快速学习大型不平衡生物分子网络中的节点标签。

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

机构信息

AnacletoLab - Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135, Italy.

Department of Dermatology, Fondazione IRCCS Ca' Granda,, Ospedale Maggiore Policlinico, Milan, 20122, Italy.

出版信息

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):353. doi: 10.1186/s12859-018-2301-4.

Abstract

BACKGROUND

Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks.

RESULTS

We propose a novel semi-supervised parallel enhancement of COSNET, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method.

CONCLUSIONS

By parallelizing COSNET we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.

摘要

背景

网络生物学和医学中的几个问题可以被转化为一个框架,其中实体通过部分标记的网络来表示,目标是推断未标记部分的标签(通常是二进制的)。连接表示实体之间的功能或遗传相似性,而标签通常高度不平衡,即一类的代表性严重不足:例如,在自动蛋白质功能预测(AFP)中,大多数基因本体术语只有少数蛋白质被注释,或者在疾病-基因优先级问题中,实际上只有少数基因被认为与给定疾病的病因有关。因此,需要使用对不平衡有感知的方法来准确预测生物网络中的节点标签。此外,由于输入数据可能很大,例如在多物种蛋白质网络的情况下,此类方法必须具有可扩展性。

结果

我们提出了一种新颖的半监督并行增强方法,该方法基于 Hopfield 神经网络模型,最近被提议用于解决 AFP 问题。通过采用图形的有效表示和假设稀疏网络拓扑结构,我们从经验上证明它可以有效地应用于具有数百万个节点的网络。加速计算的关键策略是将节点划分为独立集,以便通过利用 GPU 加速器的功能并行处理每个集。这种并行技术确保收敛到渐近稳定的吸引子,同时保留原始模型的异步动力学。对真实数据和问题的人工大实例的详细实验突出了所提出方法的可扩展性和效率。

结论

通过并行化 COSNET,我们在解决 S. cerevisiae、Mus musculus 和 Homo sapiens 生物体中的 AFP 问题方面平均实现了 180 倍的加速,同时降低了内存需求。此外,为了展示该方法在巨大生物分子网络中的潜在适用性,我们预测了涉及数十万到数百万个节点的人工生成稀疏网络中的节点标签。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/6191976/925a707f3d7c/12859_2018_2301_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验