Bessmertny Igor A, Huang Xiaoxi, Platonov Aleksei V, Yu Chuqiao, Koroleva Julia A
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China.
Saint Petersburg National Research, University of Information Technology Mechanics and Optics, St. Petersburg 197101, Russia.
Entropy (Basel). 2020 Feb 28;22(3):275. doi: 10.3390/e22030275.
Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell's test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.
搜索引擎能够找到包含查询模式的文档。这种方法可用于英语等字母语言。然而,中文高度依赖上下文。中文文本处理的一个重大问题是词与词之间缺少空格,因此在进行任何其他操作之前有必要将文本分词。中文文本分词算法应考虑上下文;也就是说,分词过程取决于其他表意文字。由于现有的分词算法并不完善,我们考虑了一种从查询词周围所有可能的n元语法构建上下文的方法。本文提出了一种受量子启发的方法,根据中文文本文档与查询的相关性对其进行排序。特别地,这种方法使用贝尔测试,该测试测量上下文中两个词的量子纠缠。词的上下文是使用超空间语言模拟(HAL)算法构建的。在三个领域进行的实验表明,所提出的方法提供了可接受的结果。