Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou, China.
School of Computing, University of Southern Mississippi, Hattiesburg, MS, USA.
J Healthc Eng. 2017;2017:8051673. doi: 10.1155/2017/8051673. Epub 2017 Jun 15.
It is important to identify and prevent disease risk as early as possible through regular physical examinations. We formulate the disease risk prediction into a multilabel classification problem. A novel Ensemble Label Power-set Pruned datasets Joint Decomposition (ELPPJD) method is proposed in this work. First, we transform the multilabel classification into a multiclass classification. Then, we propose the pruned datasets and joint decomposition methods to deal with the imbalance learning problem. Two strategies size balanced (SB) and label similarity (LS) are designed to decompose the training dataset. In the experiments, the dataset is from the real physical examination records. We contrast the performance of the ELPPJD method with two different decomposition strategies. Moreover, the comparison between ELPPJD and the classic multilabel classification methods RAkEL and HOMER is carried out. The experimental results show that the ELPPJD method with label similarity strategy has outstanding performance.
通过定期体检尽早识别和预防疾病风险非常重要。我们将疾病风险预测制定为多标签分类问题。在这项工作中,提出了一种新颖的集成标签幂集剪枝数据集联合分解(ELPPJD)方法。首先,我们将多标签分类转换为多类分类。然后,我们提出了剪枝数据集和联合分解方法来处理不平衡学习问题。设计了两种策略,大小平衡(SB)和标签相似性(LS),来分解训练数据集。在实验中,数据集来自真实的体检记录。我们将 ELPPJD 方法的性能与两种不同的分解策略进行了对比。此外,还对 ELPPJD 方法与经典的多标签分类方法 RAkEL 和 HOMER 进行了比较。实验结果表明,具有标签相似性策略的 ELPPJD 方法具有出色的性能。