Postgraduate Program in Cognition and Language, North Fluminense State University - UENF, Av. Alberto Lamego, 2000 - Parque Califórnia - CEP 28013-602, Campos dos Goitacazes, Rio de Janeiro, Brazil; Computer Modelling Department, State of Rio de Janeiro University, Rua Bonfim, 25 - Vila Amélia - CEP 28625-570 - Nova Friburgo, Rio de Janeiro, Brazil.
Computer Modelling Department, State of Rio de Janeiro University, Rua Bonfim, 25 - Vila Amélia - CEP 28625-570 - Nova Friburgo, Rio de Janeiro, Brazil; Veiga de Almeida University, Rua Ibituruna, 108 - Maracanã - CEP 20271-020, Rio de Janeiro, Brazil.
Comput Methods Programs Biomed. 2018 Oct;165:139-149. doi: 10.1016/j.cmpb.2018.08.016. Epub 2018 Aug 24.
Given the phenomenon of aging population, dementias arise as a complex health problem throughout the world. Several methods of machine learning have been applied to the task of predicting dementias. Given its diagnostic complexity, the great challenge lies in distinguishing patients with some type of dementia from healthy people. Particularly in the early stages, the diagnosis positively impacts the quality of life of both the patient and the family. This work presents a hybrid data mining model, involving the mining of texts integrated to the mining of structured data. This model aims to assist specialists in the diagnosis of patients with clinical suspicion of dementia.
The experiments were conducted from a set of 605 medical records with 19 different attributes about patients with cognitive decline reports. Firstly, a new structured attribute was created from a text mining process. It was the result of clustering the patient's pathological history information stored in an unstructured textual attribute. Classification algorithms (naïve bayes, bayesian belief networks and decision trees) were applied to obtain Alzheimer's disease and mild cognitive impairment predictive models. Ensemble methods (Bagging, Boosting and Random Forests) were used in order to improve the accuracy of the generated models. These methods were applied in two datasets: one containing only the original structured data; the other containing the original structured data with the inclusion of the new attribute resulting from the text mining (hybrid model).
The models' accuracy metrics obtained from the two different datasets were compared. The results evidenced the greater effectiveness of the hybrid model in the diagnostic prediction for the pathologies of interest.
When analysing the different methods of classification and clustering used, the better rates related to the precision and sensitivity of the pathologies under study were obtained with hybrid models with support of ensemble methods.
随着人口老龄化现象的出现,痴呆症成为了全球范围内一个复杂的健康问题。许多机器学习方法已经被应用于预测痴呆症的任务中。由于其诊断的复杂性,最大的挑战在于将患有某种类型痴呆症的患者与健康人群区分开来。特别是在早期,诊断结果会对患者和家庭的生活质量产生积极影响。本研究提出了一种混合数据挖掘模型,涉及到文本挖掘和结构化数据挖掘的整合。该模型旨在协助专家对疑似痴呆症患者进行诊断。
实验是从一组包含 605 份记录的患者病历中进行的,这些记录包含 19 个不同属性的认知障碍报告。首先,从文本挖掘过程中创建了一个新的结构化属性。它是对存储在非结构化文本属性中的患者病史信息进行聚类的结果。应用分类算法(朴素贝叶斯、贝叶斯信念网络和决策树)来获得阿尔茨海默病和轻度认知障碍预测模型。集成方法(Bagging、Boosting 和随机森林)用于提高生成模型的准确性。这些方法应用于两个数据集:一个仅包含原始结构化数据的数据集;另一个包含原始结构化数据以及文本挖掘产生的新属性的数据集(混合模型)。
比较了两个不同数据集的模型准确性指标。结果表明,在对感兴趣的病理进行诊断预测时,混合模型的效果更好。
在分析所使用的分类和聚类方法时,混合模型在支持集成方法的情况下,获得了与研究病理相关的精度和敏感性的更好结果。