Nguyen Thanh, Yue Zongliang, Slominski Radomir, Welner Robert, Zhang Jianyi, Chen Jake Y
Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States.
Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States.
Front Big Data. 2022 Nov 4;5:1016606. doi: 10.3389/fdata.2022.1016606. eCollection 2022.
In network biology, molecular functions can be characterized by network-based inference, or "guilt-by-associations." PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process.
We describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion -value, which helps evaluate the statistical significance of adding "non-seed" molecules to the original biomolecular interaction network consisting of "seed" molecules and molecular interactions. The second type is a node-ranking -value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree-preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND.
WINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
在网络生物学中,分子功能可通过基于网络的推断或“关联有罪”来表征。类似PageRank的工具已应用于生物分子相互作用网络的研究,以进一步获取网络中所有分子的相对重要性。然而,在广泛可获取的基因与基因关联或蛋白质与蛋白质相互作用的数据集中存在大量固有噪声。如何开发稳健的测试方法来扩展、筛选和排列疾病特异性网络中的分子实体,仍然是一个临时的数据分析过程。
我们描述了一种名为加权网络内节点扩展与排序(WINNER)的新型生物分子表征和优先级排序工具。它以任何分子相互作用网络数据为输入,并生成一个可选扩展的网络,其中所有节点根据它们在网络中彼此的相关性进行排序。为帮助用户评估结果的稳健性,WINNER提供两种不同类型的统计数据。第一种是节点扩展值,它有助于评估将“非种子”分子添加到由“种子”分子和分子相互作用组成的原始生物分子相互作用网络中的统计显著性。第二种是节点排序值,它有助于评估每个节点对整体网络架构贡献的相对统计显著性。我们通过在几个网络置换实验中加入噪声来验证WINNER在对顶级分子进行排序时的稳健性。我们发现基因网络的节点度保持随机化产生正态分布的排序分数,其优于使用其他基因网络随机化技术得到的分数。此外,我们验证了与现有方法(如PageRank)相比,WINNER排序的基因中与疾病生物学相关的比例更高。我们通过一些案例研究展示了WINNER的性能,包括阿尔茨海默病、乳腺癌、心肌梗死和三阴性乳腺癌(TNBC)。在所有这些案例研究中,WINNER识别出的扩展和顶级排序基因比其他基因优先级排序软件工具(包括 Ingenuity Pathway Analysis (IPA) 和 DiAMOND)识别出的基因更能显著揭示疾病生物学特征。
当网络涵盖足够的节点和边信息时,WINNER排序与其他排序方法高度相关,表明网络质量较高。只要有可用的基因/蛋白质/代谢网络信息,WINNER用户就可以使用这个新工具来稳健地评估高通量生物学实验产生的候选基因、蛋白质或代谢物列表。