• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

细胞类型注释模型选择:单细胞 RNA-Seq 数据中的通用型与模式感知特征基因选择

Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data.

机构信息

School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada.

出版信息

Genes (Basel). 2023 Feb 26;14(3):596. doi: 10.3390/genes14030596.

DOI:10.3390/genes14030596
PMID:36980868
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10048047/
Abstract

With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells' functionality are determined based on the differences in the gene expression profiles. Although the observations indicate a great performance of clustering methods, manual annotation of the clusters of cells is a challenge yet to be addressed more scalable and faster. On the other hand, due to the lack of enough labelled datasets, just a few supervised techniques have been used in cell type identification, and they obtained more robust results compared to clustering methods. A recent study showed that a complementary step of feature selection helped support vector machine (SVM) to outperform other classifiers in different scenarios. In this article, we compare and evaluate the performance of two state-of-the-art supervised methods, XGBoost and SVM, with information gain as a feature selection method. The results of the experiments on three standard scRNA-seq datasets indicate that XGBoost automatically annotates cell types in a simpler and more scalable framework. Additionally, it sheds light on the potential use of boosting tree approaches combined with deep neural networks to capture underlying information of single-cell RNA-Seq data more effectively. It can be used to identify marker genes and other applications in biological studies.

摘要

随着高通量测序技术的进步,越来越多的研究广泛地揭示了细胞之间的异质性。个体细胞功能的差异是基于基因表达谱的差异来确定的。尽管观察表明聚类方法具有很好的性能,但手动注释细胞聚类仍然是一个尚未解决的挑战,需要更具可扩展性和更快的速度。另一方面,由于缺乏足够的标记数据集,只有少数监督技术被用于细胞类型识别,并且它们与聚类方法相比获得了更稳健的结果。最近的一项研究表明,特征选择的补充步骤有助于支持向量机(SVM)在不同场景下优于其他分类器。在本文中,我们比较和评估了两种最先进的监督方法,XGBoost 和 SVM,以及信息增益作为特征选择方法的性能。在三个标准 scRNA-seq 数据集上的实验结果表明,XGBoost 以更简单和更具可扩展性的框架自动注释细胞类型。此外,它还揭示了使用提升树方法结合深度神经网络更有效地捕获单细胞 RNA-Seq 数据潜在信息的潜力。它可用于识别标记基因和生物研究中的其他应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6289/10048047/cdde57b8570a/genes-14-00596-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6289/10048047/cdde57b8570a/genes-14-00596-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6289/10048047/cdde57b8570a/genes-14-00596-g001.jpg

相似文献

1
Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data.细胞类型注释模型选择:单细胞 RNA-Seq 数据中的通用型与模式感知特征基因选择
Genes (Basel). 2023 Feb 26;14(3):596. doi: 10.3390/genes14030596.
2
On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data.基于 QDE-SVM 的 scRNA-seq 数据基因特征选择和细胞类型分类方法。
PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023.
3
scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data.scDSSC:用于 scRNA-seq 数据的深度稀疏子空间聚类。
PLoS Comput Biol. 2022 Dec 19;18(12):e1010772. doi: 10.1371/journal.pcbi.1010772. eCollection 2022 Dec.
4
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。
Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.
5
scSwinFormer: A Transformer-Based Cell-Type Annotation Method for scRNA-Seq Data Using Smooth Gene Embedding and Global Features.scSwinFormer:一种基于 Transformer 的单细胞 RNA-Seq 数据细胞类型注释方法,使用平滑基因嵌入和全局特征。
J Chem Inf Model. 2024 Aug 26;64(16):6316-6323. doi: 10.1021/acs.jcim.4c00616. Epub 2024 Aug 5.
6
Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.单细胞 RNA 测序分析:分步概述。
Methods Mol Biol. 2021;2284:343-365. doi: 10.1007/978-1-0716-1307-8_19.
7
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.降维与聚类模型在单细胞 RNA 测序数据中的应用:一项比较研究。
Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.
8
Automatic Cell Type Annotation Using Marker Genes for Single-Cell RNA Sequencing Data.基于标记基因的单细胞 RNA 测序数据自动细胞类型注释。
Biomolecules. 2022 Oct 21;12(10):1539. doi: 10.3390/biom12101539.
9
Accurate feature selection improves single-cell RNA-seq cell clustering.准确的特征选择可提高单细胞 RNA-seq 细胞聚类。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab034.
10
scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets.scMRA:一种用于用多个参考数据集注释单细胞RNA测序数据的强大深度学习方法。
Bioinformatics. 2022 Jan 12;38(3):738-745. doi: 10.1093/bioinformatics/btab700.

引用本文的文献

1
Special Issue: New Advances in Bioinformatics and Biomedical Engineering Using Machine Learning Techniques, IWBBIO-2022.特刊:机器学习技术在生物信息学和生物医学工程中的新进展,IWBBIO-2022。
Genes (Basel). 2023 Aug 1;14(8):1574. doi: 10.3390/genes14081574.

本文引用的文献

1
SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing.SMaSH:一种用于单细胞 RNA 测序的可扩展的通用标记基因识别框架。
BMC Bioinformatics. 2022 Aug 8;23(1):328. doi: 10.1186/s12859-022-04860-2.
2
scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections.scEFSC:基于多重特征选择的集成共识聚类实现准确的单细胞RNA测序数据分析
Comput Struct Biotechnol J. 2022 Apr 27;20:2181-2197. doi: 10.1016/j.csbj.2022.04.023. eCollection 2022.
3
Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data.
使用流形学习和单细胞 RNA-Seq 数据的增强可视化发现细胞类型。
Sci Rep. 2022 Jan 7;12(1):120. doi: 10.1038/s41598-021-03613-0.
4
FEATS: feature selection-based clustering of single-cell RNA-seq data.FEATS:基于特征选择的单细胞 RNA-seq 数据聚类。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa306.
5
Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data.单细胞 RNA-seq 数据的集成降维和特征基因提取。
Nat Commun. 2020 Nov 17;11(1):5853. doi: 10.1038/s41467-020-19465-7.
6
ascend: R package for analysis of single-cell RNA-seq data.ascend:用于分析单细胞 RNA-seq 数据的 R 包。
Gigascience. 2019 Aug 1;8(8). doi: 10.1093/gigascience/giz087.
7
A comparison of automatic cell identification methods for single-cell RNA sequencing data.单细胞 RNA 测序数据的自动细胞识别方法比较。
Genome Biol. 2019 Sep 9;20(1):194. doi: 10.1186/s13059-019-1795-z.
8
ACTINN: automated identification of cell types in single cell RNA sequencing.ACTINN:单细胞 RNA 测序中细胞类型的自动识别。
Bioinformatics. 2020 Jan 15;36(2):533-538. doi: 10.1093/bioinformatics/btz592.
9
Current best practices in single-cell RNA-seq analysis: a tutorial.单细胞 RNA 测序分析的当前最佳实践:教程。
Mol Syst Biol. 2019 Jun 19;15(6):e8746. doi: 10.15252/msb.20188746.
10
CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments.CASTLe - 通过迁移学习对单细胞进行分类:利用公开的单细胞 RNA 测序实验的力量来注释新的实验。
PLoS One. 2018 Oct 10;13(10):e0205499. doi: 10.1371/journal.pone.0205499. eCollection 2018.