Briggs Farren B S, Sept Corriene
Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 2103 Cornell Rd, Cleveland, OH 44106, USA.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Int J Environ Res Public Health. 2021 Mar 3;18(5):2518. doi: 10.3390/ijerph18052518.
(1) Background: Complex genetic relationships, including gene-gene (G × G; epistasis), gene(), and gene-environment (G × E) interactions, explain a substantial portion of the heritability in multiple sclerosis (MS). Machine learning and data mining methods are promising approaches for uncovering higher order genetic relationships, but their use in MS have been limited. (2) Methods: Association rule mining (ARM), a combinatorial rule-based machine learning algorithm, was applied to genetic data for non-Latinx MS cases ( = 207) and controls ( = 179). The objective was to identify patterns (rules) amongst the known MS risk variants, including presence, absence, and 194 of the 200 common autosomal variants. Probabilistic measures (confidence and support) were used to mine rules. (3) Results: 114 rules met minimum requirements of 80% confidence and 5% support. The top ranking rule by confidence consisted of , -rs56678847 and -rs6880809; carriers of these variants had a significantly greater risk for MS (odds ratio = 20.2, 95% CI: 8.5, 37.5; = 4 × 10). Several variants were shared across rules, the most common was -rs78727559, which was in 32.5% of rules. (4) Conclusions: In summary, we demonstrate evidence that specific combinations of MS risk variants disproportionately confer elevated risk by applying a robust analytical framework to a modestly sized study population.
(1)背景:复杂的遗传关系,包括基因-基因(G×G;上位性)、基因(此处原文缺失内容)和基因-环境(G×E)相互作用,解释了多发性硬化症(MS)中很大一部分遗传力。机器学习和数据挖掘方法是揭示高阶遗传关系的有前景的方法,但它们在MS中的应用一直有限。(2)方法:关联规则挖掘(ARM),一种基于组合规则的机器学习算法,应用于非拉丁裔MS病例(n = 207)和对照(n = 179)的遗传数据。目的是在已知的MS风险变异中识别模式(规则),包括存在、缺失以及200个常见常染色体变异中的194个。使用概率度量(置信度和支持度)来挖掘规则。(3)结果:114条规则满足80%置信度和5%支持度的最低要求。按置信度排名最高的规则由 、-rs56678847和-rs6880809组成;这些变异的携带者患MS的风险显著更高(优势比 = 20.2,95%置信区间:8.5,37.5;P = 4×10)。有几个变异在不同规则中共享,最常见的是-rs78727559,它出现在32.5%的规则中。(4)结论:总之,我们通过将一个强大的分析框架应用于规模适中的研究人群,证明了特定组合的MS风险变异不成比例地赋予更高风险的证据。