Liu Zhenpeng, Zhang Shengcong, Zhang Jialiang, Jiang Mingxiao, Liu Yi
Information Technology Center, Hebei University, Baoding 071002, China.
School of Cyber Security and Computer, Hebei University, Baoding 071002, China.
Entropy (Basel). 2023 Jun 29;25(7):998. doi: 10.3390/e25070998.
Most Heterogeneous Information Network (HIN) embedding methods use meta-paths to guide random walks to sample from HIN and perform representation learning in order to overcome the bias of traditional random walks that are more biased towards high-order nodes. Their performance depends on the suitability of the generated meta-paths for the current HIN. The definition of meta-paths requires domain expertise, which makes the results overly dependent on the meta-paths. Moreover, it is difficult to represent the structure of complex HIN with a single meta-path. In a meta-path guided random walk, some of the heterogeneous structures (e.g., node type(s)) are not among the node types specified by the meta-path, making this heterogeneous information ignored. In this paper, HeteEdgeWalk, a solution method that does not involve meta-paths, is proposed. We design a dynamically adjusted bidirectional edge-sampling walk strategy. Specifically, edge sampling and the storage of recently selected edge types are used to better sample the network structure in a more balanced and comprehensive way. Finally, node classification and clustering experiments are performed on four real HINs with in-depth analysis. The results show a maximum performance improvement of 2% in node classification and at least 0.6% in clustering compared to baselines. This demonstrates the superiority of the method to effectively capture semantic information from HINs.
大多数异质信息网络(HIN)嵌入方法使用元路径来指导随机游走,以便从HIN中进行采样并执行表示学习,从而克服传统随机游走对高阶节点更有偏向性的偏差。它们的性能取决于所生成的元路径对当前HIN的适用性。元路径的定义需要领域专业知识,这使得结果过度依赖于元路径。此外,用单个元路径难以表示复杂HIN的结构。在元路径引导的随机游走中,一些异构结构(例如,节点类型)不在元路径指定的节点类型之中,从而导致这种异构信息被忽略。本文提出了一种不涉及元路径的解决方法HeteEdgeWalk。我们设计了一种动态调整的双向边采样游走策略。具体而言,边采样和对最近选择的边类型的存储用于以更平衡和全面的方式更好地对网络结构进行采样。最后,在四个真实HIN上进行了节点分类和聚类实验,并进行了深入分析。结果表明,与基线相比,节点分类的性能最大提高了2%,聚类性能至少提高了0.6%。这证明了该方法从HIN中有效捕获语义信息的优越性。