Suppr超能文献

随机森林增强酒精消费相关基因的选择。

Enhancing selection of alcohol consumption-associated genes by random forest.

机构信息

Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA.

Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA02118, USA.

出版信息

Br J Nutr. 2024 Jun 28;131(12):2058-2067. doi: 10.1017/S0007114524000795. Epub 2024 Apr 12.

Abstract

Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers . moderate drinkers, non-drinkers . heavy drinkers and moderate drinkers . heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of , , and , and and were positively associated with obesity, and was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.

摘要

机器学习方法已被用于识别各种表型的组学标志物。我们旨在研究监督机器学习算法是否能提高鉴定与酒精相关的转录组标志物的能力。在这项研究中,我们分析了 5508 名弗雷明汉心脏研究参与者的基于阵列的全血衍生表达数据,涉及 17873 个基因转录本。通过使用 Boruta 算法,一种基于监督随机森林(RF)的特征选择方法,我们选择了 25 个与酒精相关的转录本。在测试集中(整个研究参与者的 30%),这些 25 个转录本的非饮酒者、适度饮酒者、非饮酒者. 重度饮酒者和适度饮酒者. 重度饮酒者的 AUC(接受者操作特征曲线下的面积)分别为 0.73、0.69 和 0.66。Boruta 方法选择的转录本的 AUC 与使用传统线性回归模型识别的 AUC 相当,例如,使用传统线性回归模型识别的 1958 个转录本的 AUC 分别为 0.74、0.66 和 0.65。对于 Boruta 方法选择的 25 个转录本和三个 CVD 风险因素(即 < 6.7e-4),我们进行了 Bonferroni 校正,观察到 13 个转录本与肥胖有关,3 个转录本与 2 型糖尿病有关,1 个转录本与高血压有关。例如,我们观察到饮酒与 、 和 的表达呈负相关,而 和 与肥胖呈正相关, 与高血压呈负相关。总之,使用监督机器学习方法,即基于 RF 的 Boruta 算法,我们鉴定了新的与酒精相关的基因转录本。

相似文献

1
Enhancing selection of alcohol consumption-associated genes by random forest.
Br J Nutr. 2024 Jun 28;131(12):2058-2067. doi: 10.1017/S0007114524000795. Epub 2024 Apr 12.
3
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.
4
Prediction and feature selection of low birth weight using machine learning algorithms.
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
6
Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022.
Int J Environ Res Public Health. 2024 Nov 6;21(11):1474. doi: 10.3390/ijerph21111474.
9
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
10
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.
BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.

本文引用的文献

2
A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment.
Front Genet. 2023 Jul 20;14:1199087. doi: 10.3389/fgene.2023.1199087. eCollection 2023.
4
A review on longitudinal data analysis with random forest.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad002.
5
A saturated map of common genetic variants associated with human height.
Nature. 2022 Oct;610(7933):704-712. doi: 10.1038/s41586-022-05275-y. Epub 2022 Oct 12.
8
PANTHER: Making genome-scale phylogenetics accessible to all.
Protein Sci. 2022 Jan;31(1):8-22. doi: 10.1002/pro.4218. Epub 2021 Nov 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验