Suppr超能文献

scDLC:一种用于分类大型单细胞 RNA-seq 数据的深度学习框架。

scDLC: a deep learning framework to classify large sample single-cell RNA-seq data.

机构信息

College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, China.

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

出版信息

BMC Genomics. 2022 Jul 12;23(1):504. doi: 10.1186/s12864-022-08715-1.

Abstract

BACKGROUND

Using single-cell RNA sequencing (scRNA-seq) data to diagnose disease is an effective technique in medical research. Several statistical methods have been developed for the classification of RNA sequencing (RNA-seq) data, including, for example, Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). Nevertheless, few existing methods perform well for large sample scRNA-seq data, in particular when the distribution assumption is also violated.

RESULTS

We propose a deep learning classifier (scDLC) for large sample scRNA-seq data, based on the long short-term memory recurrent neural networks (LSTMs). Our new scDLC does not require a prior knowledge on the data distribution, but instead, it takes into account the dependency of the most outstanding feature genes in the LSTMs model. LSTMs is a special recurrent neural network, which can learn long-term dependencies of a sequence.

CONCLUSIONS

Simulation studies show that our new scDLC performs consistently better than the existing methods in a wide range of settings with large sample sizes. Four real scRNA-seq datasets are also analyzed, and they coincide with the simulation results that our new scDLC always performs the best. The code named "scDLC" is publicly available at https://github.com/scDLC-code/code .

摘要

背景

使用单细胞 RNA 测序(scRNA-seq)数据来诊断疾病是医学研究中的一种有效技术。已经开发了几种用于 RNA 测序(RNA-seq)数据分类的统计方法,例如泊松线性判别分析(PLDA)、负二项式线性判别分析(NBLDA)和零膨胀泊松逻辑判别分析(ZIPLDA)。然而,现有的方法很少能够很好地处理大样本 scRNA-seq 数据,特别是当分布假设也被违反时。

结果

我们提出了一种基于长短期记忆递归神经网络(LSTM)的用于大样本 scRNA-seq 数据的深度学习分类器(scDLC)。我们的新 scDLC 不需要对数据分布有先验知识,而是考虑了 LSTM 模型中最突出特征基因的依赖性。LSTM 是一种特殊的递归神经网络,可以学习序列的长期依赖关系。

结论

模拟研究表明,在大样本量的广泛设置中,我们的新 scDLC 的性能始终优于现有的方法。还分析了四个真实的 scRNA-seq 数据集,结果与模拟结果一致,即我们的新 scDLC 始终表现最佳。名为“scDLC”的代码可在 https://github.com/scDLC-code/code 上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96e3/9281153/b9d278a943d0/12864_2022_8715_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验