Himmelstein Daniel S, Zietz Michael, Rubinetti Vincent, Kloster Kyle, Heil Benjamin J, Alquaddoomi Faisal, Hu Dongbo, Nicholson David N, Hao Yun, Sullivan Blair D, Nagle Michael W, Greene Casey S
Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Related Sciences.
Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biomedical Informatics, Columbia University, New York, New York, United States of America.
bioRxiv. 2023 Jan 7:2023.01.05.522941. doi: 10.1101/2023.01.05.522941.
Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .
异构网络(Hetnets)是“异构网络”的缩写,包含多种节点和关系类型,并提供了一种编码生物医学知识的方法。一个这样的例子是Hetionet,它连接了11种类型的节点——包括基因、疾病、药物、通路和解剖结构——以及24种类型的超过200万条边。先前的工作表明,应用于此类网络的监督机器学习方法可以识别药物重新利用的机会。然而,对于许多类型的节点对来说,并不存在已知关系的训练集,即使研究这些类型的节点如何有意义地连接是有用的。例如,用户可能不仅好奇二甲双胍与乳腺癌之间的关系,还好奇该基因可能如何参与失眠。我们开发了一种新的程序,称为异构网络连通性搜索,它可以在不需要监督金标准的情况下,提出任意两个节点之间的重要路径。连通性搜索背后的算法识别出比偶然预期(仅基于节点度)更频繁出现的路径类型。我们发现,对于某些节点类型对,预测结果与先前描述的监督方法大致相似。单个路径的评分基于给定类型中最具体的路径。需要进行几次优化,以便在大型知识图谱的规模上预先计算节点连通性的重要实例。我们在Hetionet上实现了该方法,并在https://het.io/search提供了一个在线界面。我们在名为hetmatpy的新Python包中提供了这些方法的开源实现。