随机森林整合分析 AD 和晚期大脑转录组全数据，以鉴定疾病特异性基因表达。

Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.

机构信息

University of Kentucky, Lexington, Kentucky, United States of America.

Qingdao University, Qingdao, Shandong, China.

出版信息

PLoS One. 2021 Sep 7;16(9):e0256648. doi: 10.1371/journal.pone.0256648. eCollection 2021.

DOI:10.1371/journal.pone.0256648

PMID:34492068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8423259/

Abstract

Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects thinking, memory, and behavior. Limbic-predominant age-related TDP-43 encephalopathy (LATE) is a recently identified common neurodegenerative disease that mimics the clinical symptoms of AD. The development of drugs to prevent or treat these neurodegenerative diseases has been slow, partly because the genes associated with these diseases are incompletely understood. A notable hindrance from data analysis perspective is that, usually, the clinical samples for patients and controls are highly imbalanced, thus rendering it challenging to apply most existing machine learning algorithms to directly analyze such datasets. Meeting this data analysis challenge is critical, as more specific disease-associated gene identification may enable new insights into underlying disease-driving mechanisms and help find biomarkers and, in turn, improve prospects for effective treatment strategies. In order to detect disease-associated genes based on imbalanced transcriptome-wide data, we proposed an integrated multiple random forests (IMRF) algorithm. IMRF is effective in differentiating putative genes associated with subjects having LATE and/or AD from controls based on transcriptome-wide data, thereby enabling effective discrimination between these samples. Various forms of validations, such as cross-domain verification of our method over other datasets, improved and competitive classification performance by using identified genes, effectiveness of testing data with a classifier that is completely independent from decision trees and random forests, and relationships with prior AD and LATE studies on the genes linked to neurodegeneration, all testify to the effectiveness of IMRF in identifying genes with altered expression in LATE and/or AD. We conclude that IMRF, as an effective feature selection algorithm for imbalanced data, is promising to facilitate the development of new gene biomarkers as well as targets for effective strategies of disease prevention and treatment.

摘要

阿尔茨海默病（AD）是一种复杂的神经退行性疾病，影响思维、记忆和行为。以边缘系统为主的与年龄相关的 TDP-43 脑病（LATE）是一种最近发现的常见神经退行性疾病，其临床症状类似于 AD。开发预防或治疗这些神经退行性疾病的药物进展缓慢，部分原因是与这些疾病相关的基因尚未完全了解。从数据分析的角度来看，一个显著的障碍是，通常情况下，患者和对照组的临床样本高度不平衡，因此，大多数现有的机器学习算法难以直接分析此类数据集。应对这一数据分析挑战至关重要，因为更具体的疾病相关基因的鉴定可能为潜在疾病驱动机制提供新的见解，并有助于寻找生物标志物，进而提高有效治疗策略的前景。为了基于不平衡的转录组范围数据检测疾病相关基因，我们提出了一种集成多个随机森林（IMRF）算法。IMRF 基于转录组范围的数据，在区分具有 LATE 和/或 AD 的受试者与对照的假定基因方面非常有效，从而能够有效区分这些样本。各种形式的验证，例如在其他数据集上对我们方法的跨域验证、使用鉴定的基因提高和竞争分类性能、使用完全独立于决策树和随机森林的分类器测试数据的有效性，以及与先前 AD 和 LATE 研究的关系，都证明了 IMRF 在识别 LATE 和/或 AD 中表达改变的基因方面的有效性。我们得出结论，IMRF 作为一种有效的不平衡数据特征选择算法，有望促进新的基因生物标志物的开发以及有效预防和治疗疾病策略的靶点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1332/8423259/ced3e0a8e898/pone.0256648.g001.jpg

相似文献

Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.

PLoS One. 2021 Sep 7;16(9):e0256648. doi: 10.1371/journal.pone.0256648. eCollection 2021.

Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

J Neurosci Methods. 2018 May 15;302:47-57. doi: 10.1016/j.jneumeth.2017.12.005. Epub 2017 Dec 11.

Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database.

J Neurosci Methods. 2018 May 15;302:14-23. doi: 10.1016/j.jneumeth.2017.12.010. Epub 2017 Dec 18.

Biomarker Extraction Based on Subspace Learning for the Prediction of Mild Cognitive Impairment Conversion.

Biomed Res Int. 2021 Sep 2;2021:5531940. doi: 10.1155/2021/5531940. eCollection 2021.

Classification of Alzheimer's disease and prediction of mild cognitive impairment-to-Alzheimer's conversion from structural magnetic resource imaging using feature ranking and a genetic algorithm.

Comput Biol Med. 2017 Apr 1;83:109-119. doi: 10.1016/j.compbiomed.2017.02.011. Epub 2017 Feb 27.

Multimodal Data Analysis of Alzheimer's Disease Based on Clustering Evolutionary Random Forest.

IEEE J Biomed Health Inform. 2020 Oct;24(10):2973-2983. doi: 10.1109/JBHI.2020.2973324. Epub 2020 Feb 11.

A Classification Algorithm by Combination of Feature Decomposition and Kernel Discriminant Analysis (KDA) for Automatic MR Brain Image Classification and AD Diagnosis.

Comput Math Methods Med. 2019 Dec 30;2019:1437123. doi: 10.1155/2019/1437123. eCollection 2019.

Alzheimer's disease diagnosis from diffusion tensor images using convolutional neural networks.

PLoS One. 2020 Mar 24;15(3):e0230409. doi: 10.1371/journal.pone.0230409. eCollection 2020.

An ensemble learning system for a 4-way classification of Alzheimer's disease and mild cognitive impairment.

J Neurosci Methods. 2018 May 15;302:75-81. doi: 10.1016/j.jneumeth.2018.03.008. Epub 2018 Mar 22.

A Machine Learning-Based Holistic Approach to Predict the Clinical Course of Patients within the Alzheimer's Disease Spectrum.

J Alzheimers Dis. 2022;85(4):1639-1655. doi: 10.3233/JAD-210573.

引用本文的文献

An exploratory study of high-throughput transcriptomic analysis reveals novel mRNA biomarkers for acute myocardial infarction using integrated methods.

Sci Rep. 2025 Mar 11;15(1):8436. doi: 10.1038/s41598-025-92757-4.

Mapping Knowledge Landscapes and Emerging Trends in AI for Dementia Biomarkers: Bibliometric and Visualization Analysis.

J Med Internet Res. 2024 Aug 8;26:e57830. doi: 10.2196/57830.

Transcriptome analysis of the Japanese eel (Anguilla japonica) during larval metamorphosis.

BMC Genomics. 2024 Jun 11;25(1):585. doi: 10.1186/s12864-024-10459-z.

Deep learning algorithm reveals probabilities of stage-specific time to conversion in individuals with neurodegenerative disease LATE.

Alzheimers Dement (N Y). 2022 Nov 3;8(1):e12363. doi: 10.1002/trc2.12363. eCollection 2022.

Machine Learning Approach Predicts Probability of Time to Stage-Specific Conversion of Alzheimer's Disease.

J Alzheimers Dis. 2022;90(2):891-903. doi: 10.3233/JAD-220590.

Algorithmic Stability and Generalization of an Unsupervised Feature Selection Algorithm.

Adv Neural Inf Process Syst. 2021 Dec;34:19860-19875.

Risk Factors and Prediction Models for Nonalcoholic Fatty Liver Disease Based on Random Forest.

Comput Math Methods Med. 2022 Aug 9;2022:8793659. doi: 10.1155/2022/8793659. eCollection 2022.

本文引用的文献

Harnessing the paradoxical phenotypes of APOE ɛ2 and APOE ɛ4 to identify genetic modifiers in Alzheimer's disease.

Alzheimers Dement. 2021 May;17(5):831-846. doi: 10.1002/alz.12240. Epub 2020 Dec 7.

Limbic-predominant age-related TDP-43 encephalopathy differs from frontotemporal lobar degeneration.

Brain. 2020 Sep 1;143(9):2844-2857. doi: 10.1093/brain/awaa219.

Limbic Predominant Age-Related TDP-43 Encephalopathy (LATE): Clinical and Neuropathological Associations.

J Neuropathol Exp Neurol. 2020 Mar 1;79(3):305-313. doi: 10.1093/jnen/nlz126.

A transcriptomic analysis of Nsmce1 overexpression in mouse hippocampal neuronal cell by RNA sequencing.

Funct Integr Genomics. 2020 May;20(3):459-470. doi: 10.1007/s10142-019-00728-6. Epub 2019 Dec 2.

Sex differences in gene expression patterns associated with the allele.

F1000Res. 2019 Apr 5;8:387. doi: 10.12688/f1000research.18671.2. eCollection 2019.

Integrating Gene and Protein Expression Reveals Perturbed Functional Networks in Alzheimer's Disease.

Cell Rep. 2019 Jul 23;28(4):1103-1116.e4. doi: 10.1016/j.celrep.2019.06.073.

Meta-Analysis of Gene Expression and Identification of Biological Regulatory Mechanisms in Alzheimer's Disease.

Front Neurosci. 2019 Jul 3;13:633. doi: 10.3389/fnins.2019.00633. eCollection 2019.

Limbic-predominant age-related TDP-43 encephalopathy (LATE): consensus working group report.

Brain. 2019 Jun 1;142(6):1503-1527. doi: 10.1093/brain/awz099.

The Major Risk Factors for Alzheimer's Disease: Age, Sex, and Genes Modulate the Microglia Response to Aβ Plaques.

Cell Rep. 2019 Apr 23;27(4):1293-1306.e6. doi: 10.1016/j.celrep.2019.03.099.

Genetic Variants Associated With Neurodegenerative Diseases Regulate Gene Expression in Immune Cell CD14+ Monocytes.

Front Genet. 2018 Dec 18;9:666. doi: 10.3389/fgene.2018.00666. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

随机森林整合分析 AD 和晚期大脑转录组全数据，以鉴定疾病特异性基因表达。

Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献