Sun Wei, Liu Hui, Wang Yu, Shi Weihao, Wang Xiao, Zou Zhiwei
State Grid Anhui Electric Power Research Institute, Hefei, 230601, China.
Sci Rep. 2025 Aug 2;15(1):28284. doi: 10.1038/s41598-025-12173-6.
The current power grid business handles massive data operations where data retrieval frequently encounters redundancy issues. Conventional decision tree-based methods struggle to achieve accurate data acquisition when facing redundant interference. To address this challenge, this study proposes a multi-level redundant data retrieval method using an improved decision tree algorithm for grid resource business center platforms. The methodology first establishes a multi-level data decision tree using grid resource business middle-platform data, then applies a decision tree pruning algorithm based on Akaike information criterion. The ant colony algorithm optimizes the pruning parameters of the decision tree model, and after obtaining optimal pruning parameters, processes the grid resource business middle-platform data decision tree to generate an improved version. Subsequently, the multi-level redundant data retrieval method based on the improved decision tree implements fast retrieval of hierarchical redundant data in grid resource business through designed repetitive data processing flows and multi-level redundant data discrimination mechanisms. The experimental results demonstrate that the improved decision tree algorithm improves multi-level redundant data retrieval accuracy by 14%. The optimized decision tree model for middle-platform data achieves more comprehensive representation of grid resource service data hierarchies and enables effective retrieval of multi-level redundant data including both image and text categories from the middle-platform data. The maximum F1-score reaches 0.99 with retrieval time of only 4.5 s, which is 1.5 s below the predefined threshold, confirming excellent practical performance.
当前电网业务处理海量数据操作,数据检索频繁遇到冗余问题。传统的基于决策树的方法在面对冗余干扰时难以实现准确的数据获取。为应对这一挑战,本研究针对电网资源业务中心平台提出一种使用改进决策树算法的多级冗余数据检索方法。该方法首先利用电网资源业务中间平台数据建立多级数据决策树,然后应用基于赤池信息准则的决策树剪枝算法。蚁群算法优化决策树模型的剪枝参数,在获得最优剪枝参数后,对电网资源业务中间平台数据决策树进行处理以生成改进版本。随后,基于改进决策树的多级冗余数据检索方法通过设计的重复数据处理流程和多级冗余数据判别机制实现电网资源业务中分层冗余数据的快速检索。实验结果表明,改进的决策树算法将多级冗余数据检索准确率提高了14%。针对中间平台数据优化后的决策树模型更全面地表示了电网资源服务数据层次结构,并能够从中间平台数据中有效检索包括图像和文本类别在内的多级冗余数据。最大F1分数达到0.99,检索时间仅为4.5秒,比预定义阈值低1.5秒,证实了出色的实际性能。