Suppr超能文献

使用深度卷积神经网络和 Shapley 值对肌萎缩侧索硬化症进行分子分类和解释。

Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values.

机构信息

GenieUs Genomics, 19a Boundary St, Darlinghurst, NSW 2010, Australia.

School of Biotechnology and Biomolecular Sciences, Faculty of Science, The University of New South Wales, Sydney, NSW 2033, Australia.

出版信息

Genes (Basel). 2021 Oct 30;12(11):1754. doi: 10.3390/genes12111754.

Abstract

Amyotrophic lateral sclerosis (ALS) is a prototypical neurodegenerative disease characterized by progressive degeneration of motor neurons to severely effect the functionality to control voluntary muscle movement. Most of the non-additive genetic aberrations responsible for ALS make its molecular classification very challenging along with limited sample size, curse of dimensionality, class imbalance and noise in the data. Deep learning methods have been successful in many other related areas but have low minority class accuracy and suffer from the lack of explainability when used directly with RNA expression features for ALS molecular classification. In this paper, we propose a deep-learning-based molecular ALS classification and interpretation framework. Our framework is based on training a convolution neural network (CNN) on images obtained from converting RNA expression values into pixels based on DeepInsight similarity technique. Then, we employed Shapley additive explanations (SHAP) to extract pixels with higher relevance to ALS classifications. These pixels were mapped back to the genes which made them up. This enabled us to classify ALS samples with high accuracy for a minority class along with identifying genes that might be playing an important role in ALS molecular classifications. Taken together with RNA expression images classified with CNN, our preliminary analysis of the genes identified by SHAP interpretation demonstrate the value of utilizing Machine Learning to perform molecular classification of ALS and uncover disease-associated genes.

摘要

肌萎缩性侧索硬化症(ALS)是一种典型的神经退行性疾病,其特征是运动神经元进行性退化,严重影响控制随意肌肉运动的功能。大多数导致 ALS 的非加性遗传异常使得其分子分类极具挑战性,同时还存在样本量有限、维度诅咒、类不平衡和数据噪声等问题。深度学习方法在许多其他相关领域已经取得了成功,但在直接使用 RNA 表达特征进行 ALS 分子分类时,其少数类别的准确性较低,并且缺乏可解释性。在本文中,我们提出了一种基于深度学习的分子 ALS 分类和解释框架。我们的框架基于在基于 DeepInsight 相似性技术将 RNA 表达值转换为像素的图像上训练卷积神经网络(CNN)。然后,我们采用 Shapley 可加性解释(SHAP)来提取与 ALS 分类相关性更高的像素。这些像素被映射回构成它们的基因。这使我们能够对少数类 ALS 样本进行高精度分类,并确定可能在 ALS 分子分类中发挥重要作用的基因。结合用 CNN 分类的 RNA 表达图像,我们通过 SHAP 解释识别的基因的初步分析表明,利用机器学习进行 ALS 的分子分类并发现与疾病相关的基因具有重要价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab87/8626003/e3a452df6698/genes-12-01754-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验