• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

免疫细胞类型特征发现和随机森林分类用于分析单细胞基因表达数据集。

Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets.

机构信息

Oncology Data Science, Merck Healthcare KGaA, Darmstadt, Germany.

Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.

出版信息

Front Immunol. 2023 Aug 4;14:1194745. doi: 10.3389/fimmu.2023.1194745. eCollection 2023.

DOI:10.3389/fimmu.2023.1194745
PMID:37609075
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10441575/
Abstract

BACKGROUND

Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches.

RESULTS

We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment.

DISCUSSION AND CONCLUSION

We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.

摘要

背景

强大的免疫细胞基因表达特征是单细胞研究分析的核心。几乎所有已知的免疫细胞特征集都是通过仅利用单个基因表达数据集推导出来的。利用多个集成数据集的力量可以产生高质量的免疫细胞特征集,这些特征集可以作为基于机器学习的细胞类型分类方法的更好输入。

结果

我们建立了一种基于基因-基因表达相似性的新型免疫细胞类型特征发现工作流程。它利用了多个数据集,这里有来自六种不同癌症类型的七个单细胞表达数据集,最终得到了十一个免疫细胞类型特异性基因表达特征。我们使用这些特征来训练随机森林分类器,用于单细胞 RNA-seq 数据集的免疫细胞类型分配。与在独立基准数据集上用于细胞类型分配的常用方法相比,我们获得了相似或更好的预测结果。与其他已发表的免疫细胞类型基因集相比,我们的基因特征集在基于随机森林的细胞类型分类中产生了更高的预测分数。我们进一步通过重新分析已发表的 IFN 刺激实验,证明了我们的方法如何有助于避免下游统计分析中的偏差。

讨论与结论

我们证明了我们的免疫细胞特征的质量及其在基于随机森林的细胞分型方法中的出色表现。我们认为,基于我们相对较少的基因集和基于随机森林的方法对细胞进行分类,不仅与广泛使用的已发表方法相匹配或优于这些方法,而且还促进了更具统计学意义的细胞类型之间差异基因表达的下游无偏统计分析,相比以前的细胞分类算法,可分析的基因数量显著增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/5d81a8c78a7e/fimmu-14-1194745-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/57617c669289/fimmu-14-1194745-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/41d29d5fd2e5/fimmu-14-1194745-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/dfdf3c94586b/fimmu-14-1194745-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/a14f4644babc/fimmu-14-1194745-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/5d81a8c78a7e/fimmu-14-1194745-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/57617c669289/fimmu-14-1194745-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/41d29d5fd2e5/fimmu-14-1194745-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/dfdf3c94586b/fimmu-14-1194745-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/a14f4644babc/fimmu-14-1194745-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5041/10441575/5d81a8c78a7e/fimmu-14-1194745-g005.jpg

相似文献

1
Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets.免疫细胞类型特征发现和随机森林分类用于分析单细胞基因表达数据集。
Front Immunol. 2023 Aug 4;14:1194745. doi: 10.3389/fimmu.2023.1194745. eCollection 2023.
2
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
3
Comparative analysis of weka-based classification algorithms on medical diagnosis datasets.基于 WEKA 的分类算法在医学诊断数据集上的比较分析。
Technol Health Care. 2023;31(S1):397-408. doi: 10.3233/THC-236034.
4
Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模
Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.
5
Methodology to identify a gene expression signature by merging microarray datasets.通过合并微阵列数据集来识别基因表达特征的方法。
Comput Biol Med. 2023 Jun;159:106867. doi: 10.1016/j.compbiomed.2023.106867. Epub 2023 Apr 11.
6
Illustrating the biological functions and diagnostic value of transmembrane protein family members in glioma.阐明跨膜蛋白家族成员在神经胶质瘤中的生物学功能及诊断价值。
Front Oncol. 2023 Mar 31;13:1145676. doi: 10.3389/fonc.2023.1145676. eCollection 2023.
7
Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.从单细胞RNA测序数据中发现最佳细胞类型分类标记基因。
bioRxiv. 2024 Jun 26:2024.04.22.590194. doi: 10.1101/2024.04.22.590194.
8
Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer.通过机器学习结合小样本前列腺癌队列鉴定转录组预后特征
Front Genet. 2020 Nov 25;11:550894. doi: 10.3389/fgene.2020.550894. eCollection 2020.
9
Microbiome-based classification models for fresh produce safety and quality evaluation.基于微生物组的分类模型在新鲜农产品安全和质量评价中的应用。
Microbiol Spectr. 2024 Apr 2;12(4):e0344823. doi: 10.1128/spectrum.03448-23. Epub 2024 Mar 6.
10
Machine learning algorithm for precise prediction of 2'-O-methylation (Nm) sites from experimental RiboMethSeq datasets.从实验性 RiboMethSeq 数据集准确预测 2'-O-甲基化 (Nm) 位点的机器学习算法。
Methods. 2022 Jul;203:311-321. doi: 10.1016/j.ymeth.2022.03.007. Epub 2022 Mar 18.

引用本文的文献

1
Global trends in machine learning applications for single-cell transcriptomics research.单细胞转录组学研究中机器学习应用的全球趋势。
Hereditas. 2025 Aug 16;162(1):164. doi: 10.1186/s41065-025-00528-y.
2
Expression signatures with specificity for type I and II IFN response and relevance for autoimmune diseases and cancer.对I型和II型干扰素反应具有特异性且与自身免疫性疾病和癌症相关的表达特征。
J Transl Med. 2025 Jul 3;23(1):740. doi: 10.1186/s12967-025-06628-7.
3
Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics.

本文引用的文献

1
TISCH2: expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment.TISCH2:用于肿瘤微环境单细胞转录组分析的扩展数据集和新工具。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1425-D1431. doi: 10.1093/nar/gkac959.
2
Functional inference of gene regulation using single-cell multi-omics.利用单细胞多组学进行基因调控的功能推断
Cell Genom. 2022 Sep 14;2(9). doi: 10.1016/j.xgen.2022.100166. Epub 2022 Aug 4.
3
A probabilistic gene expression barcode for annotation of cell types from single-cell RNA-seq data.
芋螺毒素:生物信息学中的分类、预测及未来方向
Toxins (Basel). 2025 Feb 9;17(2):78. doi: 10.3390/toxins17020078.
一种基于概率的基因表达条码,用于注释单细胞 RNA-seq 数据中的细胞类型。
Biostatistics. 2022 Oct 14;23(4):1150-1164. doi: 10.1093/biostatistics/kxac021.
4
Cross-tissue immune cell analysis reveals tissue-specific features in humans.跨组织免疫细胞分析揭示人类组织特异性特征。
Science. 2022 May 13;376(6594):eabl5197. doi: 10.1126/science.abl5197.
5
Perspectives on rigor and reproducibility in single cell genomics.单细胞基因组学中关于严谨性和可重复性的观点。
PLoS Genet. 2022 May 10;18(5):e1010210. doi: 10.1371/journal.pgen.1010210. eCollection 2022 May.
6
Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data.利用单细胞转录组数据中的特定标记组合进行全自动超快速细胞类型识别。
Nat Commun. 2022 Mar 10;13(1):1246. doi: 10.1038/s41467-022-28803-w.
7
A single-cell tumor immune atlas for precision oncology.单细胞肿瘤免疫图谱助力精准肿瘤学
Genome Res. 2021 Oct;31(10):1913-1926. doi: 10.1101/gr.273300.120. Epub 2021 Sep 21.
8
A single-cell and spatially resolved atlas of human breast cancers.人类乳腺癌的单细胞和空间分辨图谱。
Nat Genet. 2021 Sep;53(9):1334-1347. doi: 10.1038/s41588-021-00911-1. Epub 2021 Sep 6.
9
Mitogen-activated protein kinase activity drives cell trajectories in colorectal cancer.丝裂原活化蛋白激酶活性驱动结直肠癌中的细胞轨迹。
EMBO Mol Med. 2021 Oct 7;13(10):e14123. doi: 10.15252/emmm.202114123. Epub 2021 Aug 19.
10
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.