China Telecom Research Institute, Beijing, China.
PLoS One. 2023 Oct 18;18(10):e0293091. doi: 10.1371/journal.pone.0293091. eCollection 2023.
Patent application technology disclosure document is one of the important bases for judging patent novelty and uniqueness. Automated evaluation can effectively solve the problems of long time and strong subjectivity of human evaluation. The text similarity evaluation algorithm based on corpus and deep learning technology has problems such as insufficient amount of cross-library learning data and insufficient core content tendency in the similarity judgment of patent application technology disclosure document, which limits their performance and practical application. In this paper, we propose a similarity evaluation method of patent application technology disclosure document based on multi-dimensional fusion strategy to realize the similarity measurement of patents. Firstly, in the text preprocessing section, word segmentation reconstruction and similarity evaluation optimization strategies based on word frequency and part-of-speech score weighted fusion are proposed. Then, a similarity calculation method of patent application technology disclosure document based on two new mapping spaces of dot matrix and image is proposed to achieve a more diversified comprehensive evaluation. The algorithm was evaluated by using four published text similarity matching datasets (containing 0-5 or 0/1 labels) and a set of patent application technology disclosure documents. Experimental results show that on the published text similarity matching datasets, the similarity evaluation method under the multi-dimensional fusion strategy proposed in this paper has a discrimination accuracy improvement of about 10% compared to traditional vector semantic model, and can match the discriminative ability of lightweight deep learning models without the need for training. At the same time, the discrimination accuracy of the proposed method on the sample dataset of patent application technology disclosure document is superior to traditional vector semantic model (20%) and various deep learning models (1%-8%), and the precision and recall rate are relatively balanced. The visual analysis results on the dataset of the patent application technology disclosure documents also prove the effectiveness and reliability of the similarity calculation method proposed in the dot matrix and image space, which provide a new idea and method for the similarity evaluation between patent application technology disclosure document.
专利申请技术交底书是判断专利新颖性和独特性的重要依据之一。自动化评估可以有效地解决人工评估时间长、主观性强的问题。基于语料库和深度学习技术的文本相似度评估算法在专利申请技术交底书的相似度判断中存在跨库学习数据量不足、核心内容倾向不足等问题,限制了其性能和实际应用。本文提出了一种基于多维融合策略的专利申请技术交底书相似度评估方法,实现了专利的相似度度量。首先,在文本预处理部分,提出了基于词频和词性得分加权融合的分词重构和相似度评估优化策略。然后,提出了一种基于点矩阵和图像的两个新映射空间的专利申请技术交底书相似度计算方法,实现了更加多样化的综合评价。该算法通过使用四个已发布的文本相似度匹配数据集(包含 0-5 或 0/1 标签)和一组专利申请技术交底书进行了评估。实验结果表明,在已发布的文本相似度匹配数据集中,与传统的向量语义模型相比,本文提出的多维融合策略下的相似度评估方法的判别准确率提高了约 10%,且无需训练即可匹配轻量级深度学习模型的判别能力。同时,该方法在专利申请技术交底书样本数据集上的判别准确率优于传统的向量语义模型(20%)和各种深度学习模型(1%-8%),并且精度和召回率相对平衡。在专利申请技术交底书数据集上的可视化分析结果也证明了点矩阵和图像空间中相似度计算方法的有效性和可靠性,为专利申请技术交底书之间的相似度评估提供了新的思路和方法。