Suppr超能文献

基于 Apache Spark 的异构网络上可扩展的随机游走与重启动算法,用于通过 II 型模糊数据融合对疾病相关基因进行排序。

A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion.

机构信息

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.

出版信息

J Biomed Inform. 2021 Mar;115:103688. doi: 10.1016/j.jbi.2021.103688. Epub 2021 Feb 2.

Abstract

One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.

摘要

生物学和医学科学的有效任务之一是找到与疾病相关的基因。最近的研究使用基因/蛋白质网络来寻找这些基因。由于这些网络中存在假阳性相互作用,因此结果通常不准确且不可靠。整合多个基因/蛋白质网络可以克服这一缺点,从而产生具有较少假阳性相互作用的网络。整合方法在构建网络的质量中起着至关重要的作用。在本文中,我们整合了多个来源来构建可靠的异质网络,即包含不同类型节点的网络。由于基因/蛋白质来源不同,首先构建了四个基因-基因相似性网络,并通过应用 II 型模糊投票方案进行整合。所得到的基因-基因网络通过两部分疾病-基因网络与疾病-疾病相似性网络(作为整合四个来源的结果)连接。我们提出了一种新的算法,即基于模糊融合的异质网络上的随机游走与重启动(RWRHN-FF)。通过在异质网络上运行 RWRHN-FF,可以确定与疾病相关的基因。使用留一交叉验证的实验结果表明,RWRHN-FF 优于现有方法。该算法可用于寻找前列腺癌、乳腺癌、胃癌和结肠癌的新基因。由于 RWRHN-FF 算法在大型异质网络上收敛缓慢,因此我们提出了在 Apache Spark 平台上并行实现 RWRHN-FF 算法,以实现高通量和可靠的网络推断。在不同大小的异质网络上运行的实验表明,与其他非分布式实现模式相比,收敛速度更快。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验