Center of Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Muniz, Fundação Oswaldo Cruz, Salvador, Brazil.
Universidade Federal de Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil.
BMC Bioinformatics. 2018 Jun 26;19(1):245. doi: 10.1186/s12859-018-2233-z.
Asthma and allergies prevalence increased in recent decades, being a serious global health problem. They are complex diseases with strong contextual influence, so that the use of advanced machine learning tools such as genetic programming could be important for the understanding the causal mechanisms explaining those conditions. Here, we applied a multiobjective grammar-based genetic programming (MGGP) to a dataset composed by 1047 subjects. The dataset contains information on the environmental, psychosocial, socioeconomics, nutritional and infectious factors collected from participating children. The objective of this work is to generate models that explain the occurrence of asthma, and two markers of allergy: presence of IgE antibody against common allergens, and skin prick test positivity for common allergens (SPT).
The average of the accuracies of the models for asthma higher in MGGP than C4.5. IgE were higher in MGGP than in both, logistic regression and C4.5. MGGP had levels of accuracy similar to RF, but unlike RF, MGGP was able to generate models that were easy to interpret.
MGGP has shown that infections, psychosocial, nutritional, hygiene, and socioeconomic factors may be related in such an intricate way, that could be hardly detected using traditional regression based epidemiological techniques. The algorithm MGGP was implemented in c ++ and is available on repository: http://bitbucket.org/ciml-ufjf/ciml-lib .
哮喘和过敏的患病率在最近几十年有所增加,成为一个严重的全球健康问题。它们是具有强烈背景影响的复杂疾病,因此使用遗传编程等先进的机器学习工具可能对理解解释这些疾病的因果机制很重要。在这里,我们应用了一种基于多目标语法的遗传编程(MGGP)来处理由 1047 名受试者组成的数据集。该数据集包含了从参与儿童那里收集的环境、心理社会、社会经济、营养和传染病因素的信息。这项工作的目的是生成解释哮喘发生以及两种过敏标志物(存在针对常见过敏原的 IgE 抗体和常见过敏原的皮肤点刺试验阳性(SPT))的模型。
MGGP 模型预测哮喘的准确率平均高于 C4.5。IgE 在 MGGP 中的水平高于逻辑回归和 C4.5。MGGP 的准确率与随机森林(RF)相当,但与 RF 不同的是,MGGP 能够生成易于解释的模型。
MGGP 表明感染、心理社会、营养、卫生和社会经济因素之间可能存在如此复杂的关系,这很难通过传统的基于回归的流行病学技术来检测。MGGP 算法已用 C++实现,并可在存储库中获得:http://bitbucket.org/ciml-ufjf/ciml-lib。