Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan.
School of Computer Science, University College Dublin, D04 V1W8 Dublin, Ireland.
Genes (Basel). 2022 Dec 26;14(1):71. doi: 10.3390/genes14010071.
Genetic disorders are the result of mutation in the deoxyribonucleic acid (DNA) sequence which can be developed or inherited from parents. Such mutations may lead to fatal diseases such as Alzheimer's, cancer, Hemochromatosis, etc. Recently, the use of artificial intelligence-based methods has shown superb success in the prediction and prognosis of different diseases. The potential of such methods can be utilized to predict genetic disorders at an early stage using the genome data for timely treatment. This study focuses on the multi-label multi-class problem and makes two major contributions to genetic disorder prediction. A novel feature engineering approach is proposed where the class probabilities from an extra tree (ET) and random forest (RF) are joined to make a feature set for model training. Secondly, the study utilizes the classifier chain approach where multiple classifiers are joined in a chain and the predictions from all the preceding classifiers are used by the conceding classifiers to make the final prediction. Because of the multi-label multi-class data, macro accuracy, Hamming loss, and -evaluation score are used to evaluate the performance. Results suggest that extreme gradient boosting (XGB) produces the best scores with a 92% -evaluation score and a 84% macro accuracy score. The performance of XGB is much better than state-of-the-art approaches, in terms of both performance and computational complexity.
遗传疾病是脱氧核糖核酸(DNA)序列突变的结果,这种突变可能是从父母那里遗传或后天发展而来的。这种突变可能导致阿尔茨海默病、癌症、血色病等致命疾病。最近,基于人工智能的方法在预测和预后不同疾病方面取得了卓越的成功。可以利用这些方法的潜力,使用基因组数据来预测遗传疾病,以便及时进行治疗。本研究专注于多标签多类问题,并在遗传疾病预测方面做出了两项重大贡献。提出了一种新的特征工程方法,该方法将来自 Extra Tree(ET)和随机森林(RF)的类概率结合起来,形成特征集进行模型训练。其次,该研究利用分类器链方法,将多个分类器连接成一个链,并使用所有前一个分类器的预测结果由后续分类器进行最终预测。由于是多标签多类数据,因此使用宏准确率、Hamming 损失和 F1 评估分数来评估性能。结果表明,极端梯度提升(XGB)的性能最好,F1 评估分数为 92%,宏准确率为 84%。XGB 的性能在性能和计算复杂度方面均优于最先进的方法。