Suppr超能文献

一种用于高通量结核病序列分析、功能注释和可视化的综合机器学习方法。

A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization.

作者信息

Hossain Md Saddam, Khandocar Md Parvez, Riti Farzana Akter, Ali Md Yeakub, Dey Prithbey Raj, Haque S M Jahurul, Metouekel Amira, Mengistie Atrsaw Asrat, Bourhia Mohammed, Khallouki Farid, Almaary Khalid S

机构信息

Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.

Department of Industrial and Production Engineering, Faculty of Mechanical Engineering, Dhaka University of Engineering and Technology, Gazipur, 1707, Bangladesh.

出版信息

Sci Rep. 2025 Jul 16;15(1):25866. doi: 10.1038/s41598-025-98654-0.

Abstract

With human guidance, computers now use machine learning (ML) in artificial intelligence (AI) to learn from data, detect trends, and make predictions. Software can adapt and improve with new information. Imaging scans leverage pattern recognition to predict outcomes, diagnose disorders, and suggest treatments. Tuberculosis (TB) remains the most common bacterial disease affecting humans. The World Health Organisation reported that in 2022, 1.3 million people died from tuberculosis, with the death rate potentially reaching 66% if proper treatment isn't provided. We trained ML-supervised algorithms like XG Boost, Logistic Regression, Random Forest Classifier, Ad- aBoost, and Support Vector Machine to help classify TB patients from large RNA-sequence count data. Such algorithms provided prediction accuracies of 0.963, 0.739, 0.773, 0.866, and 0.866 sequentially. This article highlights feature importance techniques using the ML model, XGBoost, with the highest prediction accuracy of 0.963, identifying significant genes in TB RNA sequence count data. Using key machine learning features, we here identified 20 pathways, 24 gene ontologies, 20 hub genes, and 22 drugs. Next, we applied advanced computational techniques, including pathway analysis, GO, hub-protein and protein-protein interactions (PPI), transcriptomic and miRNA interactions, and drug-protein interactions, to help analyze 100 highly expressed genes.

摘要

在人类的指导下,计算机现在在人工智能(AI)中使用机器学习(ML)从数据中学习、检测趋势并进行预测。软件可以随着新信息进行调整和改进。成像扫描利用模式识别来预测结果、诊断疾病并建议治疗方案。结核病(TB)仍然是影响人类的最常见细菌性疾病。世界卫生组织报告称,2022年有130万人死于结核病,如果不提供适当治疗,死亡率可能达到66%。我们训练了诸如XG Boost、逻辑回归、随机森林分类器、Ad-aBoost和支持向量机等ML监督算法,以帮助从大量RNA序列计数数据中对结核病患者进行分类。这些算法的预测准确率依次为0.963、0.739、0.773、0.866和0.866。本文重点介绍了使用预测准确率最高的0.963的ML模型XGBoost的特征重要性技术,以识别结核病RNA序列计数数据中的重要基因。利用关键的机器学习特征,我们在此确定了20条通路、24个基因本体、20个枢纽基因和22种药物。接下来,我们应用了先进的计算技术,包括通路分析、基因本体(GO)、枢纽蛋白和蛋白质-蛋白质相互作用(PPI)、转录组学和miRNA相互作用以及药物-蛋白质相互作用,以帮助分析100个高表达基因。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验