Osteoarthritis Research Unit, University of Montreal Hospital Research Centre (CRCHUM), 900 Saint-Denis, R11.412, Montreal, QC, H2X 0A9, Canada.
Unidad de Genomica, Grupo de Investigación de Reumatología (GIR), Instituto de Investigación Biomédica de A Coruña (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade da Coruña, A Coruña, Spain.
BMC Med. 2022 Sep 12;20(1):316. doi: 10.1186/s12916-022-02491-1.
Knee osteoarthritis is the most prevalent chronic musculoskeletal debilitating disease. Current treatments are only symptomatic, and to improve this, we need a robust prediction model to stratify patients at an early stage according to the risk of joint structure disease progression. Some genetic factors, including single nucleotide polymorphism (SNP) genes and mitochondrial (mt)DNA haplogroups/clusters, have been linked to this disease. For the first time, we aim to determine, by using machine learning, whether some SNP genes and mtDNA haplogroups/clusters alone or combined could predict early knee osteoarthritis structural progressors.
Participants (901) were first classified for the probability of being structural progressors. Genotyping included SNP genes TP63, FTO, GNL3, DUS4L, GDF5, SUPT3H, MCF2L, and TGFA; mtDNA haplogroups H, J, T, Uk, and others; and clusters HV, TJ, KU, and C-others. They were considered for prediction with major risk factors of osteoarthritis, namely, age and body mass index (BMI). Seven supervised machine learning methodologies were evaluated. The support vector machine was used to generate gender-based models. The best input combination was assessed using sensitivity and synergy analyses. Validation was performed using tenfold cross-validation and an external cohort (TASOAC).
From 277 models, two were defined. Both used age and BMI in addition for the first one of the SNP genes TP63, DUS4L, GDF5, and FTO with an accuracy of 85.0%; the second profits from the association of mtDNA haplogroups and SNP genes FTO and SUPT3H with 82.5% accuracy. The highest impact was associated with the haplogroup H, the presence of CT alleles for rs8044769 at FTO, and the absence of AA for rs10948172 at SUPT3H. Validation accuracy with the cross-validation (about 95%) and the external cohort (90.5%, 85.7%, respectively) was excellent for both models.
This study introduces a novel source of decision support in precision medicine in which, for the first time, two models were developed consisting of (i) age, BMI, TP63, DUS4L, GDF5, and FTO and (ii) the optimum one as it has one less variable: age, BMI, mtDNA haplogroup, FTO, and SUPT3H. Such a framework is translational and would benefit patients at risk of structural progressive knee osteoarthritis.
膝骨关节炎是最常见的慢性肌肉骨骼致残性疾病。目前的治疗方法只是对症治疗,为了改善这种情况,我们需要一个强大的预测模型,以便根据关节结构疾病进展的风险在早期对患者进行分层。一些遗传因素,包括单核苷酸多态性(SNP)基因和线粒体(mt)DNA 单倍群/聚类,与这种疾病有关。我们首次旨在通过机器学习来确定某些 SNP 基因和 mtDNA 单倍群/聚类是否可以单独或组合预测早期膝骨关节炎结构进展者。
参与者(901 人)首先根据结构进展者的概率进行分类。基因分型包括 SNP 基因 TP63、FTO、GNL3、DUS4L、GDF5、SUPT3H、MCF2L 和 TGFA;mtDNA 单倍群 H、J、T、Uk 和其他;以及聚类 HV、TJ、KU 和 C-其他。考虑到骨关节炎的主要危险因素,即年龄和体重指数(BMI),对它们进行了预测。评估了七种有监督的机器学习方法。支持向量机用于生成基于性别的模型。使用敏感性和协同分析评估最佳输入组合。使用十折交叉验证和外部队列(TASOAC)进行验证。
从 277 个模型中确定了两个模型。第一个模型除了年龄和 BMI 外,还使用了 SNP 基因 TP63、DUS4L、GDF5 和 FTO,准确性为 85.0%;第二个模型则得益于 mtDNA 单倍群和 SNP 基因 FTO 和 SUPT3H 的关联,准确率为 82.5%。与 haplogroup H 相关的影响最大,rs8044769 处 FTO 的 CT 等位基因存在,rs10948172 处 SUPT3H 的 AA 不存在。两种模型的交叉验证(约 95%)和外部队列(分别为 90.5%、85.7%)的验证准确性都非常好。
本研究在精准医学中引入了一种新的决策支持源,首次开发了两个模型,其中(i)包括年龄、BMI、TP63、DUS4L、GDF5 和 FTO,(ii)最佳模型包含较少的变量:年龄、BMI、mtDNA 单倍群、FTO 和 SUPT3H。这样的框架具有转化意义,将使处于结构进展性膝骨关节炎风险中的患者受益。