Suppr超能文献

基于序列的人类微生物组机器学习分析预测克罗恩病

Crohn's Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome.

作者信息

Unal Metehan, Bostanci Erkan, Ozkul Ceren, Acici Koray, Asuroglu Tunc, Guzel Mehmet Serdar

机构信息

Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey.

Department of Pharmaceutical Microbiology, Faculty of Pharmacy, Hacettepe University, 06230 Ankara, Turkey.

出版信息

Diagnostics (Basel). 2023 Sep 1;13(17):2835. doi: 10.3390/diagnostics13172835.

Abstract

Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In this study, we employed several modern Machine Learning techniques to predict Inflammatory Bowel Disease using raw sequence data. The dataset was obtained from NCBI preprocessed graph representations and converted into a structured form. Seven well-known Machine Learning frameworks, including Random Forest, Support Vector Machines, Extreme Gradient Boosting, Light Gradient Boosting Machine, Gaussian Naïve Bayes, Logistic Regression, and k-Nearest Neighbor, were used. Grid Search was employed for hyperparameter optimization. The performance of the Machine Learning models was evaluated using various metrics such as accuracy, precision, fscore, kappa, and area under the receiver operating characteristic curve. Additionally, Mc Nemar's test was conducted to assess the statistical significance of the experiment. The data was constructed using k-mer lengths of 3, 4 and 5. The Light Gradient Boosting Machine model overperformed over other models with 67.24%, 74.63% and 76.47% accuracy for k-mer lengths of 3, 4 and 5, respectively. The LightGBM model also demonstrated the best performance in each metric. The study showed promising results predicting disease from raw sequence data. Finally, Mc Nemar's test results found statistically significant differences between different Machine Learning approaches.

摘要

人类微生物群是指栖息在我们体内的数万亿微生物,现已发现它们对人类健康和疾病有着重大影响。通过对微生物群进行采样,可以生成大量数据,以便使用机器学习算法进行分析。在本研究中,我们采用了几种现代机器学习技术,利用原始序列数据预测炎症性肠病。数据集是从NCBI获得的,经过预处理后转换为图形表示形式,再转化为结构化形式。我们使用了七种著名的机器学习框架,包括随机森林、支持向量机、极端梯度提升、轻量级梯度提升机、高斯朴素贝叶斯、逻辑回归和k近邻。采用网格搜索进行超参数优化。使用各种指标(如准确率、精确率、F值、kappa值和受试者工作特征曲线下面积)评估机器学习模型的性能。此外,还进行了麦克尼马尔检验以评估实验的统计学意义。数据是使用长度为3、4和5的k-mer构建的。轻量级梯度提升机模型在k-mer长度为3、4和5时,准确率分别为67.24%、74.63%和76.47%,优于其他模型。LightGBM模型在各项指标上也表现出最佳性能。该研究表明,从原始序列数据预测疾病取得了有前景的结果。最后,麦克尼马尔检验结果发现不同机器学习方法之间存在统计学上的显著差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a79/10486516/808ec513a99d/diagnostics-13-02835-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验