Department of Genetics, Geisel School of Medicine, Dartmouth College, Lebanon, New Hampshire 03756, USA.
J Am Med Inform Assoc. 2013 Jul-Aug;20(4):603-12. doi: 10.1136/amiajnl-2012-001574. Epub 2013 Feb 26.
Detecting complex patterns of association between genetic or environmental risk factors and disease risk has become an important target for epidemiological research. In particular, strategies that provide multifactor interactions or heterogeneous patterns of association can offer new insights into association studies for which traditional analytic tools have had limited success.
To concurrently examine these phenomena, previous work has successfully considered the application of learning classifier systems (LCSs), a flexible class of evolutionary algorithms that distributes learned associations over a population of rules. Subsequent work dealt with the inherent problems of knowledge discovery and interpretation within these algorithms, allowing for the characterization of heterogeneous patterns of association. Whereas these previous advancements were evaluated using complex simulation studies, this study applied these collective works to a 'real-world' genetic epidemiology study of bladder cancer susceptibility.
We replicated the identification of previously characterized factors that modify bladder cancer risk--namely, single nucleotide polymorphisms from a DNA repair gene, and smoking. Furthermore, we identified potentially heterogeneous groups of subjects characterized by distinct patterns of association. Cox proportional hazard models comparing clinical outcome variables between the cases of the two largest groups yielded a significant, meaningful difference in survival time in years (survivorship). A marginally significant difference in recurrence time was also noted. These results support the hypothesis that an LCS approach can offer greater insight into complex patterns of association.
This methodology appears to be well suited to the dissection of disease heterogeneity, a key component in the advancement of personalized medicine.
检测遗传或环境风险因素与疾病风险之间复杂的关联模式已成为流行病学研究的一个重要目标。特别是,提供多因素相互作用或异质关联模式的策略可以为传统分析工具成功有限的关联研究提供新的见解。
为了同时检测这些现象,之前的工作已经成功地考虑了学习分类器系统(LCS)的应用,这是一类灵活的进化算法,它将学习到的关联分布在一组规则中。随后的工作解决了这些算法中内在的知识发现和解释问题,从而能够描述异质的关联模式。虽然这些先前的进展是通过复杂的模拟研究来评估的,但本研究将这些综合工作应用于膀胱癌易感性的“真实世界”遗传流行病学研究。
我们复制了先前鉴定的膀胱癌风险修饰因素的鉴定,即 DNA 修复基因的单核苷酸多态性和吸烟。此外,我们还鉴定了具有不同关联模式的潜在异质亚组的受试者。比较两个最大亚组的病例之间临床结局变量的 Cox 比例风险模型得出了一个有意义的生存时间差异(生存率)。在复发时间上也观察到了一个略有显著的差异。这些结果支持了这样一种假设,即 LCS 方法可以提供对复杂关联模式的更深入了解。
这种方法似乎非常适合于疾病异质性的剖析,这是个性化医学发展的关键组成部分。