Gomes Luciano Gama da Silva, Cruz Álvaro Augusto Souza da, de Santana Maria Borges Rabêlo, Pinheiro Gabriela Pimentel, Santana Cinthia Vila Nova, Santos Carolina Barbosa Souza, Boorgula Meher Preethi, Campbell Monica, Machado Adelmir de Souza, Veiga Rafael Valente, Barnes Kathleen C, Costa Ryan Dos Santos, Figueiredo Camila Alexandrina
Instituto de Ciências da Saúde, Universidade Federal da Bahia, Salvador, Bahia, Brazil.
Programa de Controle da Asma na Bahia (ProAR), Universidade Federal da Bahia, Salvador, Bahia, Brazil.
J Allergy Clin Immunol Glob. 2024 May 18;3(3):100282. doi: 10.1016/j.jacig.2024.100282. eCollection 2024 Aug.
Asthma is a chronic inflammatory disease of the airways that is heterogeneous and multifactorial, making its accurate characterization a complex process. Therefore, identifying the genetic variations associated with asthma and discovering the molecular interactions between the omics that confer risk of developing this disease will help us to unravel the biological pathways involved in its pathogenesis.
We sought to develop a predictive genetic panel for asthma using machine learning methods.
We tested 3 variable selection methods: Boruta's algorithm, the top 200 genome-wide association study markers according to their respective values, and an elastic net regression. Ten different algorithms were chosen for the classification tests. A predictive panel was built on the basis of joint scores between the classification algorithms.
Two variable selection methods, Boruta and genome-wide association studies, were statistically similar in terms of the average accuracies generated, whereas elastic net had the worst overall performance. The predictive genetic panel was completed with 155 single-nucleotide variants, with 91.18% accuracy, 92.75% sensitivity, and 89.55% specificity using the support vector machine algorithm. The markers used range from known single-nucleotide variants to those not previously described in the literature. Our study shows potential in creating genetic prediction panels with tailored penalties per marker, aiding in the identification of optimal machine learning methods for intricate results.
This method is able to classify asthma and nonasthma effectively, proving its potential utility in clinical prediction and diagnosis.
哮喘是一种气道慢性炎症性疾病,具有异质性和多因素性,其准确特征描述是一个复杂的过程。因此,识别与哮喘相关的基因变异并发现赋予该疾病发病风险的组学之间的分子相互作用,将有助于我们揭示其发病机制中涉及的生物学途径。
我们试图使用机器学习方法开发一种哮喘预测基因面板。
我们测试了3种变量选择方法:博鲁塔算法、根据各自值选取的全基因组关联研究前200个标记,以及弹性网络回归。选择了10种不同的算法进行分类测试。基于分类算法之间的联合分数构建了一个预测面板。
在生成的平均准确率方面,博鲁塔算法和全基因组关联研究这两种变量选择方法在统计学上相似,而弹性网络的整体性能最差。使用支持向量机算法,预测基因面板由155个单核苷酸变异组成,准确率为91.18%,灵敏度为92.75%,特异性为89.55%。所使用的标记范围从已知的单核苷酸变异到文献中先前未描述的变异。我们的研究显示了创建每个标记具有定制惩罚的基因预测面板的潜力,有助于为复杂结果确定最佳机器学习方法。
该方法能够有效区分哮喘和非哮喘,证明了其在临床预测和诊断中的潜在效用。