• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于因果关系的特征选择和机器学习进行肌萎缩侧索硬化症的基因靶向治疗。

Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning.

机构信息

Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA.

Institute of Molecular Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA.

出版信息

Mol Med. 2023 Jan 24;29(1):12. doi: 10.1186/s10020-023-00603-y.

DOI:10.1186/s10020-023-00603-y
PMID:36694130
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9872307/
Abstract

BACKGROUND

Amyotrophic lateral sclerosis (ALS) is a rare progressive neurodegenerative disease that affects upper and lower motor neurons. As the molecular basis of the disease is still elusive, the development of high-throughput sequencing technologies, combined with data mining techniques and machine learning methods, could provide remarkable results in identifying pathogenetic mechanisms. High dimensionality is a major problem when applying machine learning techniques in biomedical data analysis, since a huge number of features is available for a limited number of samples. The aim of this study was to develop a methodology for training interpretable machine learning models in the classification of ALS and ALS-subtypes samples, using gene expression datasets.

METHODS

We performed dimensionality reduction in gene expression data using a semi-automated preprocessing systematic gene selection procedure using Statistically Equivalent Signature (SES), a causality-based feature selection algorithm, followed by Boosted Regression Trees (XGBoost) and Random Forest to train the machine learning classifiers. The SHapley Additive exPlanations (SHAP values) were used for interpretation of the machine learning classifiers. The methodology was developed and tested using two distinct publicly available ALS RNA-seq datasets. We evaluated the performance of SES as a dimensionality reduction method against: (a) Least Absolute Shrinkage and Selection Operator (LASSO), and (b) Local Outlier Factor (LOF).

RESULTS

The proposed methodology achieved 85.18% accuracy for the classification of cerebellum or frontal cortex samples as C9orf72-related familial ALS, sporadic ALS or healthy samples. Importantly, the genes identified as the most determinative have also been reported as disease-associated in ALS literature. When tested in the evaluation dataset, the methodology achieved 88.89% accuracy for the classification of sporadic ALS motor neuron samples. When LASSO was used as feature selection method instead of SES, the accuracy of the machine learning classifiers ranged from 74.07 to 96.30%, depending on tissue assessed, while LOF underperformed significantly (77.78% accuracy for the classification of pooled cerebellum and frontal cortex samples).

CONCLUSIONS

Using SES, we addressed the challenge of high dimensionality in gene expression data analysis, and we trained accurate machine learning ALS classifiers, specific for the gene expression patterns of different disease subtypes and tissue samples, while identifying disease-associated genes.

摘要

背景

肌萎缩侧索硬化症(ALS)是一种罕见的进行性神经退行性疾病,影响上下运动神经元。由于疾病的分子基础仍难以捉摸,高通量测序技术的发展,结合数据挖掘技术和机器学习方法,在识别发病机制方面可能会取得显著成果。在生物医学数据分析中应用机器学习技术时,高维度是一个主要问题,因为对于有限数量的样本,可用的特征数量巨大。本研究的目的是开发一种使用基因表达数据集对 ALS 和 ALS 亚型样本进行分类的可解释机器学习模型的方法。

方法

我们使用基于统计等效签名(SES)的半自动预处理系统基因选择程序对基因表达数据进行降维,SES 是一种基于因果关系的特征选择算法,然后使用 Boosted Regression Trees(XGBoost)和随机森林训练机器学习分类器。使用 Shapley Additive exPlanations(SHAP 值)对机器学习分类器进行解释。该方法是使用两个不同的公开可用的 ALS RNA-seq 数据集开发和测试的。我们评估了 SES 作为降维方法的性能,与:(a)最小绝对值收缩和选择算子(LASSO)和(b)局部离群因子(LOF)相比。

结果

该方法在对小脑或额叶样本进行分类时达到了 85.18%的准确率,将其分为 C9orf72 相关家族性 ALS、散发性 ALS 或健康样本。重要的是,被确定为最具决定性的基因也已在 ALS 文献中报告为与疾病相关。在评估数据集上进行测试时,该方法在对散发性 ALS 运动神经元样本进行分类时达到了 88.89%的准确率。当 LASSO 用作特征选择方法而不是 SES 时,机器学习分类器的准确率范围为 74.07%至 96.30%,具体取决于评估的组织,而 LOF 的表现明显较差(对小脑和额叶样本的分类准确率为 77.78%)。

结论

使用 SES,我们解决了基因表达数据分析中高维度的挑战,并训练了针对不同疾病亚型和组织样本的基因表达模式的准确机器学习 ALS 分类器,同时确定了与疾病相关的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/3878500e1c0c/10020_2023_603_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/3738dff74aee/10020_2023_603_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/9ba12f9094ab/10020_2023_603_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/3878500e1c0c/10020_2023_603_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/3738dff74aee/10020_2023_603_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/9ba12f9094ab/10020_2023_603_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddff/9872307/3878500e1c0c/10020_2023_603_Fig3_HTML.jpg

相似文献

1
Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning.使用基于因果关系的特征选择和机器学习进行肌萎缩侧索硬化症的基因靶向治疗。
Mol Med. 2023 Jan 24;29(1):12. doi: 10.1186/s10020-023-00603-y.
2
Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values.使用深度卷积神经网络和 Shapley 值对肌萎缩侧索硬化症进行分子分类和解释。
Genes (Basel). 2021 Oct 30;12(11):1754. doi: 10.3390/genes12111754.
3
Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study.基于人群的机器学习研究:识别和预测肌萎缩侧索硬化临床亚组。
Lancet Digit Health. 2022 May;4(5):e359-e369. doi: 10.1016/S2589-7500(21)00274-0. Epub 2022 Mar 24.
4
Unsupervised machine learning identifies distinct ALS molecular subtypes in post-mortem motor cortex and blood expression data.无监督机器学习在死后运动皮层和血液表达数据中识别出不同的肌萎缩侧索硬化症分子亚型。
Acta Neuropathol Commun. 2023 Dec 21;11(1):208. doi: 10.1186/s40478-023-01686-8.
5
Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering.基于模型和无模型技术在肌萎缩侧索硬化症诊断预测和患者聚类中的应用。
Neuroinformatics. 2019 Jul;17(3):407-421. doi: 10.1007/s12021-018-9406-9.
6
A Knowledge-Based Machine Learning Approach to Gene Prioritisation in Amyotrophic Lateral Sclerosis.基于知识的机器学习方法在肌萎缩侧索硬化症中的基因优先级排序。
Genes (Basel). 2020 Jun 19;11(6):668. doi: 10.3390/genes11060668.
7
RNA editing regulates glutamatergic synapses in the frontal cortex of a molecular subtype of Amyotrophic Lateral Sclerosis.RNA 编辑调控肌萎缩性侧索硬化症分子亚型额皮质谷氨酸能突触。
Mol Med. 2024 Jul 12;30(1):101. doi: 10.1186/s10020-024-00863-2.
8
Utilizing machine learning and lipidomics to distinguish primary lateral sclerosis from amyotrophic lateral sclerosis.利用机器学习和脂质组学区分原发性侧索硬化症与肌萎缩性侧索硬化症。
Muscle Nerve. 2023 Apr;67(4):306-310. doi: 10.1002/mus.27797. Epub 2023 Feb 20.
9
Exploring the cuproptosis-related molecular clusters in the peripheral blood of patients with amyotrophic lateral sclerosis.探索肌萎缩侧索硬化症患者外周血中的铜死亡相关分子簇。
Comput Biol Med. 2024 Jan;168:107776. doi: 10.1016/j.compbiomed.2023.107776. Epub 2023 Dec 3.
10
Genetic Epidemiology of Amyotrophic Lateral Sclerosis in Norway: A 2-Year Population-Based Study.挪威肌萎缩侧索硬化症的遗传流行病学:一项基于人群的两年期研究。
Neuroepidemiology. 2022;56(4):271-282. doi: 10.1159/000525091. Epub 2022 May 16.

引用本文的文献

1
Advancing drug development with "Fit-for-Purpose" modeling informed approaches.采用“适用目的”建模指导方法推进药物研发。
J Pharmacokinet Pharmacodyn. 2025 Sep 15;52(5):52. doi: 10.1007/s10928-025-09995-2.
2
Role and Potential of Artificial Intelligence in Biomarker Discovery and Development of Treatment Strategies for Amyotrophic Lateral Sclerosis.人工智能在肌萎缩侧索硬化症生物标志物发现及治疗策略开发中的作用与潜力
Int J Mol Sci. 2025 May 2;26(9):4346. doi: 10.3390/ijms26094346.
3
The Effect of Naturally Acquired Immunity on Mortality Predictors: A Focus on Individuals with New Coronavirus.

本文引用的文献

1
microRNA-based predictor for diagnosis of frontotemporal dementia.基于 microRNA 的额颞叶痴呆诊断预测器。
Neuropathol Appl Neurobiol. 2023 Aug;49(4):e12916. doi: 10.1111/nan.12916.
2
Machine learning powers biobank-driven drug discovery.机器学习助力生物样本库驱动的药物发现。
Nat Biotechnol. 2022 Sep;40(9):1303-1305. doi: 10.1038/s41587-022-01457-1.
3
Expression Profile of Immunoglobulin G Glycosylation in Children With Epilepsy in Han Nationality.汉族癫痫患儿免疫球蛋白G糖基化的表达谱
自然获得性免疫对死亡率预测指标的影响:聚焦新型冠状病毒感染者
Biomedicines. 2025 Mar 27;13(4):803. doi: 10.3390/biomedicines13040803.
4
Leveraging machine learning for precision medicine: a predictive model for cognitive impairment in cholestasis patients.利用机器学习实现精准医疗:胆汁淤积症患者认知障碍的预测模型。
BMC Gastroenterol. 2025 Mar 18;25(1):185. doi: 10.1186/s12876-025-03711-7.
5
Biochemical dissection of STAT3 signaling in amyotrophic lateral sclerosis.肌萎缩侧索硬化症中STAT3信号通路的生化剖析
Neural Regen Res. 2025 Nov 1;20(11):3229-3230. doi: 10.4103/NRR.NRR-D-24-00862. Epub 2024 Nov 13.
6
Exploring the role of candidalysin in the pathogenicity of by gene set enrichment analysis and evolutionary dynamics.通过基因集富集分析和进化动力学探索念珠菌溶素在 致病性中的作用。 (原文中“by gene set enrichment analysis and evolutionary dynamics”前面似乎缺少具体所研究的对象,翻译可能不太完整准确)
Am J Transl Res. 2024 Jul 15;16(7):3191-3210. doi: 10.62347/IZYM9087. eCollection 2024.
7
Machine learning in rare disease.机器学习在罕见病中的应用。
Nat Methods. 2023 Jun;20(6):803-814. doi: 10.1038/s41592-023-01886-z. Epub 2023 May 29.
Front Mol Neurosci. 2022 Jul 1;15:843897. doi: 10.3389/fnmol.2022.843897. eCollection 2022.
4
Identification of Therapeutic Targets for Amyotrophic Lateral Sclerosis Using PandaOmics - An AI-Enabled Biological Target Discovery Platform.使用PandaOmics(一个基于人工智能的生物靶点发现平台)鉴定肌萎缩侧索硬化症的治疗靶点。
Front Aging Neurosci. 2022 Jun 28;14:914017. doi: 10.3389/fnagi.2022.914017. eCollection 2022.
5
Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis.全基因组鉴定肌萎缩侧索硬化症的遗传基础。
Neuron. 2022 Mar 16;110(6):992-1008.e11. doi: 10.1016/j.neuron.2021.12.019. Epub 2022 Jan 18.
6
Meta-analysis of human and mouse ALS astrocytes reveals multi-omic signatures of inflammatory reactive states.人类和小鼠肌萎缩侧索硬化症星形胶质细胞的荟萃分析揭示了炎症反应状态的多组学特征。
Genome Res. 2022 Jan;32(1):71-84. doi: 10.1101/gr.275939.121. Epub 2021 Dec 28.
7
Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values.使用深度卷积神经网络和 Shapley 值对肌萎缩侧索硬化症进行分子分类和解释。
Genes (Basel). 2021 Oct 30;12(11):1754. doi: 10.3390/genes12111754.
8
Common genetic signatures of Alzheimer's disease in Down Syndrome.唐氏综合征中阿尔茨海默病的常见遗传特征。
F1000Res. 2020 Nov 5;9:1299. doi: 10.12688/f1000research.27096.2. eCollection 2020.
9
Emerging Roles of PRDM Factors in Stem Cells and Neuronal System: Cofactor Dependent Regulation of PRDM3/16 and FOG1/2 (Novel PRDM Factors).PRDM 因子在干细胞和神经元系统中的新兴作用:共因子依赖性调节 PRDM3/16 和 FOG1/2(新型 PRDM 因子)。
Cells. 2020 Dec 4;9(12):2603. doi: 10.3390/cells9122603.
10
What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis?基因组学中的机器学习方法能让我们了解肌萎缩侧索硬化症的分子基础吗?
J Pers Med. 2020 Nov 26;10(4):247. doi: 10.3390/jpm10040247.