Suppr超能文献

中国北京空气质量影响因素的关联规则挖掘:特殊规则编码与动态遗传算法。

Association rule mining with a special rule coding and dynamic genetic algorithm for air quality impact factors in Beijing, China.

机构信息

School of Artificial Intelligence and Big Data, Hefei University, Hefei, Anhui, China.

Key Laboratory of Intelligent Building and Building Energy Efficiency, Anhui Jianzhu University, Hefei, Anhui, China.

出版信息

PLoS One. 2024 Mar 4;19(3):e0299865. doi: 10.1371/journal.pone.0299865. eCollection 2024.

Abstract

Understanding air quality requires a comprehensive understanding of its various factors. Most of the association rule techniques focuses on high frequency terms, ignoring the potential importance of low- frequency terms and causing unnecessary storage space waste. Therefore, a dynamic genetic association rule mining algorithm is proposed in this paper, which combines the improved dynamic genetic algorithm with the association rule mining algorithm to realize the importance mining of low- frequency terms. Firstly, in the chromosome coding phase of genetic algorithm, an innovative multi-information coding strategy is proposed, which selectively stores similar values of different levels in one storage unit. It avoids storing all the values at once and facilitates efficient mining of valid rules later. Secondly, by weighting the evaluation indicators such as support, confidence and promotion in association rule mining, a new evaluation index is formed, avoiding the need to set a minimum threshold for high-interest rules. Finally, in order to improve the mining performance of the rules, the dynamic crossover rate and mutation rate are set to improve the search efficiency of the algorithm. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the unit point multi-information coding strategy in reducing the rule storage air, the effectiveness of mining the rules formed by the low frequency item set, and the effectiveness of combining the rule mining algorithm with the swarm intelligence optimization algorithm in terms of search time and convergence. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the above three aspects. The unit point multi-information coding strategy reduced the rule space storage consumption by 50%, the new evaluation index can mine more interesting rules whose interest level can be up to 90%, while mining the rules formed by the lower frequency terms, and in terms of search time, we reduced it about 20% compared with some meta-heuristic algorithms, while improving convergence.

摘要

理解空气质量需要全面了解其各个因素。大多数关联规则技术主要关注高频术语,忽略了低频术语的潜在重要性,导致不必要的存储空间浪费。因此,本文提出了一种动态遗传关联规则挖掘算法,该算法将改进的动态遗传算法与关联规则挖掘算法相结合,实现低频术语的重要性挖掘。首先,在遗传算法的染色体编码阶段,提出了一种创新性的多信息编码策略,该策略选择性地在一个存储单元中存储不同级别上的相似值。它避免了一次存储所有值,方便了后续有效规则的挖掘。其次,通过对关联规则挖掘中的支持度、置信度和提升度等评价指标进行加权,形成了新的评价指标,避免了为高兴趣规则设置最小阈值的需要。最后,为了提高规则的挖掘性能,设置了动态交叉率和变异率,以提高算法的搜索效率。在实验阶段,本文采用 2016 年北京年度空气质量数据集验证单位点多信息编码策略在减少规则存储空气方面的有效性、挖掘低频项集形成的规则的有效性以及将规则挖掘算法与群体智能优化算法相结合在搜索时间和收敛性方面的有效性。在实验阶段,本文采用 2016 年北京年度空气质量数据集验证上述三个方面的有效性。单位点多信息编码策略减少了 50%的规则空间存储消耗,新的评价指标可以挖掘出更有趣的规则,其兴趣级别可达 90%,同时挖掘低频项集形成的规则,在搜索时间方面,与一些启发式算法相比,我们将其减少了约 20%,同时提高了收敛性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b186/10911623/3e835e2f22a1/pone.0299865.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验