Zorman Milan, Verlic M
Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia.
J Int Med Res. 2009 Sep-Oct;37(5):1543-51. doi: 10.1177/147323000903700532.
Progress in biomedical research has resulted in an explosive growth of data. Use of the world wide web for sharing data has opened up possibilities for exhaustive data mining analysis. Symbolic machine learning approaches used in data mining, especially ensemble approaches, produce large sets of patterns that need to be evaluated. Manual evaluation of all patterns by a human expert is almost impossible. We propose a new approach to the evaluation of machine learning-induced knowledge by introducing a pre-evaluation step. Pre-evaluation is the automatic evaluation of patterns obtained from the data mining phase, using text mining techniques and sentiment analysis. It is used as a filter for patterns according to the support found in online resources, such as publicly-available repositories of scientific papers and reports related to the problem. The domain expert can then more easily distinguish between patterns or rules that are potential candidates for new knowledge.
生物医学研究的进展导致了数据的爆炸式增长。利用万维网共享数据为详尽的数据挖掘分析开辟了可能性。数据挖掘中使用的符号机器学习方法,特别是集成方法,会产生大量需要评估的模式。由人类专家手动评估所有模式几乎是不可能的。我们通过引入预评估步骤,提出了一种评估机器学习产生的知识的新方法。预评估是使用文本挖掘技术和情感分析对从数据挖掘阶段获得的模式进行自动评估。它根据在线资源(如与该问题相关的科学论文和报告的公开存储库)中找到的支持情况,用作模式的过滤器。然后,领域专家可以更轻松地区分那些可能成为新知识候选的模式或规则。