• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于内禀熵的 scRNA-seq 数据特征选择模型

Intrinsic entropy model for feature selection of scRNA-seq data.

机构信息

State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

J Mol Cell Biol. 2022 Jun 8;14(2). doi: 10.1093/jmcb/mjac008.

DOI:10.1093/jmcb/mjac008
PMID:35102420
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9175189/
Abstract

Recent advances of single-cell RNA sequencing (scRNA-seq) technologies have led to extensive study of cellular heterogeneity and cell-to-cell variation. However, the high frequency of dropout events and noise in scRNA-seq data confounds the accuracy of the downstream analysis, i.e. clustering analysis, whose accuracy depends heavily on the selected feature genes. Here, by deriving an entropy decomposition formula, we propose a feature selection method, i.e. an intrinsic entropy (IE) model, to identify the informative genes for accurately clustering analysis. Specifically, by eliminating the 'noisy' fluctuation or extrinsic entropy (EE), we extract the IE of each gene from the total entropy (TE), i.e. TE = IE + EE. We show that the IE of each gene actually reflects the regulatory fluctuation of this gene in a cellular process, and thus high-IE genes provide rich information on cell type or state analysis. To validate the performance of the high-IE genes, we conduct computational analysis on both simulated datasets and real single-cell datasets by comparing with other representative methods. The results show that our IE model is not only broadly applicable and robust for different clustering and classification methods, but also sensitive for novel cell types. Our results also demonstrate that the intrinsic entropy/fluctuation of a gene serves as information rather than noise in contrast to its total entropy/fluctuation.

摘要

单细胞 RNA 测序 (scRNA-seq) 技术的最新进展使得人们对细胞异质性和细胞间的变化进行了广泛的研究。然而,scRNA-seq 数据中高频率的缺失事件和噪声混淆了下游分析(即聚类分析)的准确性,而聚类分析的准确性在很大程度上取决于所选特征基因。在这里,我们通过推导出一个熵分解公式,提出了一种特征选择方法,即内在熵 (IE) 模型,以识别信息丰富的基因,从而进行准确的聚类分析。具体来说,通过消除“嘈杂”的波动或外在熵 (EE),我们从总熵 (TE) 中提取每个基因的 IE,即 TE=IE+EE。我们表明,每个基因的 IE 实际上反映了该基因在细胞过程中的调控波动,因此高 IE 基因提供了丰富的关于细胞类型或状态分析的信息。为了验证高 IE 基因的性能,我们通过与其他有代表性的方法进行比较,在模拟数据集和真实的单细胞数据集上进行了计算分析。结果表明,我们的 IE 模型不仅广泛适用于不同的聚类和分类方法,而且对新的细胞类型也很敏感。我们的结果还表明,与总熵/波动相比,一个基因的内在熵/波动是信息而不是噪声。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/3857437b5e59/mjac008fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/c9846580241e/mjac008fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/b20224cb7833/mjac008fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/4f6e394a5dc0/mjac008fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/3857437b5e59/mjac008fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/c9846580241e/mjac008fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/b20224cb7833/mjac008fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/4f6e394a5dc0/mjac008fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8a0/9175189/3857437b5e59/mjac008fig4.jpg

相似文献

1
Intrinsic entropy model for feature selection of scRNA-seq data.基于内禀熵的 scRNA-seq 数据特征选择模型
J Mol Cell Biol. 2022 Jun 8;14(2). doi: 10.1093/jmcb/mjac008.
2
sc-REnF: An entropy guided robust feature selection for single-cell RNA-seq data.sc-REnF:一种用于单细胞RNA测序数据的熵引导鲁棒特征选择方法
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab517.
3
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.降维与聚类模型在单细胞 RNA 测序数据中的应用:一项比较研究。
Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.
4
On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data.基于 QDE-SVM 的 scRNA-seq 数据基因特征选择和细胞类型分类方法。
PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023.
5
Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data.基于对比学习的深度增强约束聚类算法在单细胞 RNA-seq 数据分析中的应用。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad222.
6
Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder.
J Comput Biol. 2022 Oct;29(10):1074-1084. doi: 10.1089/cmb.2022.0118. Epub 2022 Jul 14.
7
FEED: a feature selection method based on gene expression decomposition for single cell clustering.FEED:一种基于基因表达分解的单细胞聚类特征选择方法。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad389.
8
scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data.scDSSC:用于 scRNA-seq 数据的深度稀疏子空间聚类。
PLoS Comput Biol. 2022 Dec 19;18(12):e1010772. doi: 10.1371/journal.pcbi.1010772. eCollection 2022 Dec.
9
scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation.scHFC:一种基于自然计算优化的单细胞 RNA-seq 数据的混合模糊聚类方法。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab588.
10
Boosting scRNA-seq data clustering by cluster-aware feature weighting.通过聚类感知特征加权来提升 scRNA-seq 数据聚类。
BMC Bioinformatics. 2021 Jun 2;22(Suppl 6):130. doi: 10.1186/s12859-021-04033-7.

引用本文的文献

1
Neuroactive network tissue based on dual-factor neuroregenerative bioactive coating scaffolds and neural stem cells for spinal cord injury repair.基于双因素神经再生生物活性涂层支架和神经干细胞的神经活性网络组织用于脊髓损伤修复。
Mater Today Bio. 2025 Aug 5;34:102172. doi: 10.1016/j.mtbio.2025.102172. eCollection 2025 Oct.
2
scRDEN: single-cell dynamic gene rank differential expression network and robust trajectory inference.scRDEN:单细胞动态基因排名差异表达网络及稳健轨迹推断
Sci Rep. 2025 May 15;15(1):16963. doi: 10.1038/s41598-025-01969-1.
3
Single-cell omics: experimental workflow, data analyses and applications.

本文引用的文献

1
SMAD7 and SERPINE1 as novel dynamic network biomarkers detect and regulate the tipping point of TGF-beta induced EMT.SMAD7和SERPINE1作为新型动态网络生物标志物可检测并调节转化生长因子-β诱导的上皮-间质转化的临界点。
Sci Bull (Beijing). 2020 May 30;65(10):842-853. doi: 10.1016/j.scib.2020.01.013. Epub 2020 Jan 16.
2
Detection for disease tipping points by landscape dynamic network biomarkers.通过景观动态网络生物标志物检测疾病临界点
Natl Sci Rev. 2019 Jul;6(4):775-785. doi: 10.1093/nsr/nwy162. Epub 2018 Dec 28.
3
Dynamics-based data science in biology.
单细胞组学:实验工作流程、数据分析及应用
Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23.
生物学中基于动力学的数据科学。
Natl Sci Rev. 2021 Feb 12;8(5):nwab029. doi: 10.1093/nsr/nwab029. eCollection 2021 May.
4
Landscape dynamic network biomarker analysis reveals the tipping point of transcriptome reprogramming to prevent skin photodamage.景观动态网络生物标志物分析揭示了转录组重编程以预防皮肤光损伤的转折点。
J Mol Cell Biol. 2022 Jan 21;13(11):822-833. doi: 10.1093/jmcb/mjab060.
5
Single-Cell Information Analysis Reveals That Skeletal Muscles Incorporate Cell-to-Cell Variability as Information Not Noise.单细胞信息分析揭示出骨骼肌将细胞间的变异性作为信息而非噪声纳入其中。
Cell Rep. 2020 Sep 1;32(9):108051. doi: 10.1016/j.celrep.2020.108051.
6
An entropy-based metric for assessing the purity of single cell populations.基于熵的单细胞群体纯度评估指标。
Nat Commun. 2020 Jun 22;11(1):3155. doi: 10.1038/s41467-020-16904-3.
7
CCL20 Signaling in the Tumor Microenvironment.CCL20 在肿瘤微环境中的信号转导作用。
Adv Exp Med Biol. 2020;1231:53-65. doi: 10.1007/978-3-030-36667-4_6.
8
"Dysfunctions" induced by Roux-en-Y gastric bypass surgery are concomitant with metabolic improvement independent of weight loss.Roux-en-Y胃旁路手术引起的“功能障碍”与代谢改善同时出现,且与体重减轻无关。
Cell Discov. 2020 Jan 28;6:4. doi: 10.1038/s41421-019-0138-2. eCollection 2020.
9
DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture.深视:一种将非图像数据转换为卷积神经网络架构图像的方法。
Sci Rep. 2019 Aug 6;9(1):11399. doi: 10.1038/s41598-019-47765-6.
10
Comprehensive Integration of Single-Cell Data.单细胞数据的综合整合。
Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.