Iltanen Kati, Kiviharju Sami, Ao Lida, Juhola Martti, Pyykkö Ilmari
School of Information Sciences, University of Tampere, Finland.
Stud Health Technol Inform. 2013;192:452-6.
In this study, we examine the applicability of association rules for analysing high-dimensional data concerning age-related hearing impairment (ARHI). The ARHI data of the study contain hundreds of variables concerning phenotype, genotype and environmental factors. The number of association rules produced from the data is too large for manual exploration in the raw and furthermore, the rules are overlapping. Thus, the focus of our study is to develop an approach to cluster association rules into subsets and to summarise and represent the found rule subsets for easier exploration of rules. The results show that it is possible to efficiently extract rules representing interesting environmental factor-gene or gene-gene interactions. Finding suitable parameters for the association rule mining and the possibility to post-process the mined rules is essential. The developed approach facilitates rule exploration by grouping rules with items concerning the same phenomenon to the same subset and byrevealing overlapping rules.
在本研究中,我们检验关联规则在分析与年龄相关性听力减退(ARHI)相关的高维数据方面的适用性。该研究的ARHI数据包含数百个有关表型、基因型和环境因素的变量。从这些数据中产生的关联规则数量太多,无法进行原始的人工探索,而且这些规则相互重叠。因此,我们研究的重点是开发一种方法,将关联规则聚类成子集,并总结和表示所发现的规则子集,以便更轻松地探索规则。结果表明,能够有效地提取代表有趣的环境因素-基因或基因-基因相互作用的规则。为关联规则挖掘找到合适的参数以及对挖掘出的规则进行后处理的可能性至关重要。所开发的方法通过将与同一现象相关的项目的规则分组到同一子集,并揭示重叠规则,促进了规则探索。