Higher Colleges of Technology, Dubai, UAE.
School of Engineering, Malla Reddy University, Hyderabad, India.
Big Data. 2022 Apr;10(2):151-160. doi: 10.1089/big.2021.0132. Epub 2021 Sep 23.
Fetching useful information from big medical datasets is a complicated task in the big data age. Various classification algorithms are used in the data mining process to analyze information from the big medical dataset. Nevertheless, these classification algorithms are insufficient to handle big medical data. This work proposes an efficient, ensemble-based classification framework for big medical data to deal with this problem. The proposed work involves initially applying the preprocessing technique to remove noise, missing values, and unwanted features from big medical data. The process selects a subset of classifiers from a pool of classifiers. The selected classifiers are combined to form a hybrid system for efficient classification. The methodology further involves incremental learning from data samples, explaining the predicted outputs, and achieving high classification performance. Java is used for simulation, and the Cleveland Heart Disease big dataset and Diabetes big dataset are used for classification. The experimental result shows that the proposed ensemble algorithm provides an efficient classification compared with existing algorithms based on accuracy, precision, F-measure, recall, and execution time.
从大型医疗数据集中获取有用信息是大数据时代的一项复杂任务。在数据挖掘过程中,使用各种分类算法来分析来自大型医疗数据集的信息。然而,这些分类算法不足以处理大型医疗数据。针对这个问题,本工作提出了一种用于大型医疗数据的高效、基于集成的分类框架。本工作涉及首先应用预处理技术从大型医疗数据中去除噪声、缺失值和不需要的特征。该过程从分类器池中选择一个分类器子集。选择的分类器被组合形成一个混合系统,以实现有效的分类。该方法还涉及从数据样本中进行增量学习,解释预测输出,并实现高分类性能。Java 用于模拟,Cleveland Heart Disease 大型数据集和 Diabetes 大型数据集用于分类。实验结果表明,与基于准确性、精度、F 度量、召回率和执行时间的现有算法相比,所提出的集成算法提供了更有效的分类。