• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。

Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.

作者信息

Liu Angela, Peng Beverly, Pankajam Ajith V, Duong Thu Elizabeth, Pryhuber Gloria, Scheuermann Richard H, Zhang Yun

机构信息

Department of Informatics, J. Craig Venter Institute, La Jolla, CA, United States of America.

Intramural Research Program, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

出版信息

bioRxiv. 2024 Jun 26:2024.04.22.590194. doi: 10.1101/2024.04.22.590194.

DOI:10.1101/2024.04.22.590194
PMID:38712147
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11071431/
Abstract

The use of single cell/nucleus RNA sequencing (scRNA-seq) technologies that quantitively describe cell transcriptional phenotypes is revolutionizing our understanding of cell biology, leading to new insights in cell type identification, disease mechanisms, and drug development. The tremendous growth in scRNA-seq data has posed new challenges in efficiently characterizing data-driven cell types and identifying quantifiable marker genes for cell type classification. The use of machine learning and explainable artificial intelligence has emerged as an effective approach to study large-scale scRNA-seq data. NS-Forest is a random forest machine learning-based algorithm that aims to provide a scalable data-driven solution to identify minimum combinations of necessary and sufficient marker genes that capture cell type identity with maximum classification accuracy. Here, we describe the latest version, NS-Forest version 4.0 and its companion Python package (https://github.com/JCVenterInstitute/NSForest), with several enhancements to select marker gene combinations that exhibit highly selective expression patterns among closely related cell types and more efficiently perform marker gene selection for large-scale scRNA-seq data atlases with millions of cells. By modularizing the final decision tree step, NS-Forest v4.0 can be used to compare the performance of user-defined marker genes with the NS-Forest computationally-derived marker genes based on the decision tree classifiers. To quantify how well the identified markers exhibit the desired pattern of being exclusively expressed at high levels within their target cell types, we introduce the On-Target Fraction metric that ranges from 0 to 1, with a metric of 1 assigned to markers that are only expressed within their target cell types and not in cells of any other cell types. NS-Forest v4.0 outperforms previous versions on its ability to identify markers with higher On-Target Fraction values for closely related cell types and outperforms other marker gene selection approaches at classification with significantly higher F-beta scores when applied to datasets from three human organs - brain, kidney, and lung.

摘要

单细胞/细胞核RNA测序(scRNA-seq)技术用于定量描述细胞转录表型,正在彻底改变我们对细胞生物学的理解,为细胞类型鉴定、疾病机制和药物开发带来新的见解。scRNA-seq数据的迅猛增长给高效表征数据驱动的细胞类型以及识别用于细胞类型分类的可量化标记基因带来了新的挑战。机器学习和可解释人工智能的应用已成为研究大规模scRNA-seq数据的有效方法。NS-Forest是一种基于随机森林机器学习的算法,旨在提供一种可扩展的数据驱动解决方案,以识别必要且充分的标记基因的最小组合,从而以最大分类准确率捕获细胞类型特征。在此,我们介绍最新版本NS-Forest 4.0及其配套的Python包(https://github.com/JCVenterInstitute/NSForest),它有多项改进,能够选择在密切相关细胞类型中表现出高选择性表达模式的标记基因组合,并更有效地为包含数百万个细胞的大规模scRNA-seq数据图谱进行标记基因选择。通过将最终决策树步骤模块化,NS-Forest v4.0可用于基于决策树分类器,将用户定义的标记基因与NS-Forest通过计算得出的标记基因的性能进行比较。为了量化所识别的标记在其目标细胞类型中高水平特异性表达的理想模式的表现程度,我们引入了“靶标分数”指标,其范围为0到1,1表示仅在其目标细胞类型中表达而不在任何其他细胞类型中表达的标记。NS-Forest v4.0在为密切相关细胞类型识别具有更高靶标分数值的标记方面优于先前版本,并且在应用于来自人类三个器官——脑、肾和肺的数据集进行分类时,其F-beta分数显著更高,优于其他标记基因选择方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/6748ef4eba28/nihpp-2024.04.22.590194v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/1da1eae10ac8/nihpp-2024.04.22.590194v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/5adf85a5d6f5/nihpp-2024.04.22.590194v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/f27273679d89/nihpp-2024.04.22.590194v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/364b8d09b4fa/nihpp-2024.04.22.590194v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/6748ef4eba28/nihpp-2024.04.22.590194v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/1da1eae10ac8/nihpp-2024.04.22.590194v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/5adf85a5d6f5/nihpp-2024.04.22.590194v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/f27273679d89/nihpp-2024.04.22.590194v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/364b8d09b4fa/nihpp-2024.04.22.590194v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9736/11215908/6748ef4eba28/nihpp-2024.04.22.590194v2-f0005.jpg

相似文献

1
Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。
bioRxiv. 2024 Jun 26:2024.04.22.590194. doi: 10.1101/2024.04.22.590194.
2
Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。
BMC Methods. 2024;1. doi: 10.1186/s44330-024-00015-2. Epub 2024 Nov 4.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Sexual Harassment and Prevention Training性骚扰与预防培训
6
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
7
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
8
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
9
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
10
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.医护人员非正规使用手机和其他移动设备来支持工作:定性证据综合评价。
Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.

本文引用的文献

1
Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas construction and usage.人类生物分子图谱计划(HuBMAP):3D人类参考图谱的构建与应用。
Nat Methods. 2025 Apr;22(4):845-860. doi: 10.1038/s41592-024-02563-5. Epub 2025 Mar 13.
2
Transcriptomic diversity of cell types across the adult human brain.成人脑中细胞类型的转录组多样性。
Science. 2023 Oct 13;382(6667):eadd7046. doi: 10.1126/science.add7046.
3
Guided construction of single cell reference for human and mouse lung.指导构建人类和小鼠肺部单细胞参考图谱。
Nat Commun. 2023 Jul 29;14(1):4566. doi: 10.1038/s41467-023-40173-5.
4
Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP).人类生物分子图谱计划(HuBMAP)的进展与展望。
Nat Cell Biol. 2023 Aug;25(8):1089-1100. doi: 10.1038/s41556-023-01194-w. Epub 2023 Jul 19.
5
An atlas of healthy and injured cell states and niches in the human kidney.人类肾脏健康和损伤细胞状态及生态位图谱
Nature. 2023 Jul;619(7970):585-594. doi: 10.1038/s41586-023-05769-3. Epub 2023 Jul 19.
6
A guide to the BRAIN Initiative Cell Census Network data ecosystem.《脑计划细胞普查网络数据生态系统指南》
PLoS Biol. 2023 Jun 30;21(6):e3002133. doi: 10.1371/journal.pbio.3002133. eCollection 2023 Jun.
7
Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain.基于参考的原位图像空间转录组学数据与小鼠大脑初级视觉皮层细胞类型匹配。
Sci Rep. 2023 Jun 13;13(1):9567. doi: 10.1038/s41598-023-36638-8.
8
An integrated cell atlas of the lung in health and disease.肺部健康与疾病的细胞整合图谱
Nat Med. 2023 Jun;29(6):1563-1577. doi: 10.1038/s41591-023-02327-2. Epub 2023 Jun 8.
9
Brain Data Standards - A method for building data-driven cell-type ontologies.脑数据标准——一种基于数据驱动的细胞类型本体构建方法。
Sci Data. 2023 Jan 24;10(1):50. doi: 10.1038/s41597-022-01886-2.
10
Machine learning for cell type classification from single nucleus RNA sequencing data.基于单细胞 RNA 测序数据的细胞类型分类的机器学习方法。
PLoS One. 2022 Sep 23;17(9):e0275070. doi: 10.1371/journal.pone.0275070. eCollection 2022.