• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。

Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.

作者信息

Liu Angela, Peng Beverly, Pankajam Ajith V, Duong Thu Elizabeth, Pryhuber Gloria, Scheuermann Richard H, Zhang Yun

机构信息

Department of Informatics, J. Craig Venter Institute, La Jolla, CA, USA.

Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

出版信息

BMC Methods. 2024;1. doi: 10.1186/s44330-024-00015-2. Epub 2024 Nov 4.

DOI:10.1186/s44330-024-00015-2
PMID:40893796
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12396544/
Abstract

BACKGROUND

The use of single cell/nucleus RNA sequencing (scRNA-seq) technologies that quantitively describe cell transcriptional phenotypes is revolutionizing our understanding of cell biology, leading to new insights in cell type identification, disease mechanisms, and drug development. The tremendous growth in scRNA-seq data has posed new challenges in efficiently characterizing data-driven cell types and identifying quantifiable marker genes for cell type classification. The use of machine learning and explainable artificial intelligence has emerged as an effective approach to study large-scale scRNA-seq data.

METHODS

NS-Forest is a random forest machine learning-based algorithm that aims to provide a scalable data-driven solution to identify minimum combinations of necessary and sufficient marker genes that capture cell type identity with maximum classification accuracy. Here, we describe the latest version, NS-Forest version 4.0 and its companion Python package (https://github.com/JCVenterInstitute/NSForest), with several enhancements to select marker gene combinations that exhibit highly selective expression patterns among closely related cell types and more efficiently perform marker gene selection for large-scale scRNA-seq data atlases with millions of cells.

RESULTS

By modularizing the final decision tree step, NS-Forest v4.0 can be used to compare the performance of user-defined marker genes with the NS-Forest computationally-derived marker genes based on the decision tree classifiers. To quantify how well the identified markers exhibit the desired pattern of being exclusively expressed at high levels within their target cell types, we introduce the On-Target Fraction metric that ranges from 0 to 1, with a metric of 1 assigned to markers that are only expressed within their target cell types and not in cells of any other cell types. NS-Forest v4.0 outperforms previous versions in simulation studies and on its ability to identify markers with higher On-Target Fraction values for closely related cell types in real data, and outperforms other marker gene selection approaches for cell type classification with significantly higher F-beta scores when applied to datasets from three human organs-brain, kidney, and lung.

DISCUSSION

Finally, we discuss potential use cases of the NS-Forest marker genes, including for designing spatial transcriptomics gene panels and semantic representation of cell types in biomedical ontologies, for the broad user community.

摘要

背景

单细胞/细胞核RNA测序(scRNA-seq)技术能够定量描述细胞转录表型,正在彻底改变我们对细胞生物学的理解,为细胞类型鉴定、疾病机制和药物开发带来新的见解。scRNA-seq数据的迅猛增长给高效表征数据驱动的细胞类型以及识别用于细胞类型分类的可量化标记基因带来了新的挑战。机器学习和可解释人工智能的应用已成为研究大规模scRNA-seq数据的有效方法。

方法

NS-Forest是一种基于随机森林机器学习的算法,旨在提供一种可扩展的数据驱动解决方案,以识别必要且充分的标记基因的最小组合,从而以最高的分类准确率捕获细胞类型特征。在此,我们描述了最新版本NS-Forest 4.0及其配套的Python包(https://github.com/JCVenterInstitute/NSForest),它有多项改进,可用于选择在密切相关的细胞类型中表现出高度选择性表达模式的标记基因组合,并更有效地为包含数百万个细胞的大规模scRNA-seq数据图谱进行标记基因选择。

结果

通过对最终决策树步骤进行模块化,NS-Forest v4.0可用于基于决策树分类器,将用户定义的标记基因与NS-Forest通过计算得出的标记基因的性能进行比较。为了量化所识别的标记在其靶细胞类型中高水平特异性表达的理想模式的表现程度,我们引入了“靶上分数”指标,其范围为0到1,对于仅在其靶细胞类型中表达而不在任何其他细胞类型中表达的标记,该指标赋值为1。在模拟研究中,NS-Forest v4.0在识别具有更高靶上分数值的标记方面优于先前版本,在实际数据中对密切相关的细胞类型也是如此,并且在应用于来自人类三个器官——脑、肾和肺的数据集进行细胞类型分类时,其F-beta分数显著高于其他标记基因选择方法。

讨论

最后,我们讨论了NS-Forest标记基因的潜在应用案例,包括为广大用户群体设计空间转录组学基因面板以及在生物医学本体中进行细胞类型的语义表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/eb932f0367dc/nihms-2104291-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/0c5a9e318663/nihms-2104291-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/765748b0e792/nihms-2104291-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/60c0e1151894/nihms-2104291-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/6fef706d6230/nihms-2104291-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/6aaaad997163/nihms-2104291-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/8feefa02bce9/nihms-2104291-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/38a1f0edff1b/nihms-2104291-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/eb932f0367dc/nihms-2104291-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/0c5a9e318663/nihms-2104291-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/765748b0e792/nihms-2104291-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/60c0e1151894/nihms-2104291-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/6fef706d6230/nihms-2104291-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/6aaaad997163/nihms-2104291-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/8feefa02bce9/nihms-2104291-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/38a1f0edff1b/nihms-2104291-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7c3/12396544/eb932f0367dc/nihms-2104291-f0008.jpg

相似文献

1
Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。
BMC Methods. 2024;1. doi: 10.1186/s44330-024-00015-2. Epub 2024 Nov 4.
2
Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。
bioRxiv. 2024 Jun 26:2024.04.22.590194. doi: 10.1101/2024.04.22.590194.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
Short-Term Memory Impairment短期记忆障碍
8
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.医护人员非正规使用手机和其他移动设备来支持工作:定性证据综合评价。
Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.
9
Genetic determinants of testicular sperm extraction outcomes: insights from a large multicentre study of men with non-obstructive azoospermia.睾丸精子提取结果的遗传决定因素:来自一项针对非梗阻性无精子症男性的大型多中心研究的见解
Hum Reprod Open. 2025 Aug 29;2025(3):hoaf049. doi: 10.1093/hropen/hoaf049. eCollection 2025.
10
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

引用本文的文献

1
The Cell Ontology in the age of single-cell omics.单细胞组学时代的细胞本体论。
ArXiv. 2025 Jun 17:arXiv:2506.10037v2.
2
A Multimodal Spatial and Epigenomic Atlas of Human Adult Lung Topography.人类成年肺脏拓扑结构的多模态空间与表观基因组图谱
bioRxiv. 2025 May 23:2025.05.23.655666. doi: 10.1101/2025.05.23.655666.

本文引用的文献

1
Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas construction and usage.人类生物分子图谱计划(HuBMAP):3D人类参考图谱的构建与应用。
Nat Methods. 2025 Apr;22(4):845-860. doi: 10.1038/s41592-024-02563-5. Epub 2025 Mar 13.
2
MarkerMap: nonlinear marker selection for single-cell studies.MarkerMap:单细胞研究中的非线性标记选择。
NPJ Syst Biol Appl. 2024 Feb 14;10(1):17. doi: 10.1038/s41540-024-00339-3.
3
Transcriptomic diversity of cell types across the adult human brain.成人脑中细胞类型的转录组多样性。
Science. 2023 Oct 13;382(6667):eadd7046. doi: 10.1126/science.add7046.
4
Cell-type-specific co-expression inference from single cell RNA-sequencing data.基于单细胞 RNA 测序数据的细胞类型特异性共表达推断。
Nat Commun. 2023 Aug 10;14(1):4846. doi: 10.1038/s41467-023-40503-7.
5
Guided construction of single cell reference for human and mouse lung.指导构建人类和小鼠肺部单细胞参考图谱。
Nat Commun. 2023 Jul 29;14(1):4566. doi: 10.1038/s41467-023-40173-5.
6
Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP).人类生物分子图谱计划(HuBMAP)的进展与展望。
Nat Cell Biol. 2023 Aug;25(8):1089-1100. doi: 10.1038/s41556-023-01194-w. Epub 2023 Jul 19.
7
An atlas of healthy and injured cell states and niches in the human kidney.人类肾脏健康和损伤细胞状态及生态位图谱
Nature. 2023 Jul;619(7970):585-594. doi: 10.1038/s41586-023-05769-3. Epub 2023 Jul 19.
8
A guide to the BRAIN Initiative Cell Census Network data ecosystem.《脑计划细胞普查网络数据生态系统指南》
PLoS Biol. 2023 Jun 30;21(6):e3002133. doi: 10.1371/journal.pbio.3002133. eCollection 2023 Jun.
9
Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain.基于参考的原位图像空间转录组学数据与小鼠大脑初级视觉皮层细胞类型匹配。
Sci Rep. 2023 Jun 13;13(1):9567. doi: 10.1038/s41598-023-36638-8.
10
An integrated cell atlas of the lung in health and disease.肺部健康与疾病的细胞整合图谱
Nat Med. 2023 Jun;29(6):1563-1577. doi: 10.1038/s41591-023-02327-2. Epub 2023 Jun 8.