• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过结合主特征分析、互信息和机器学习从单细胞数据中揭示的优化细胞类型特征。

Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning.

作者信息

Caliskan Aylin, Caliskan Deniz, Rasbach Lauritz, Yu Weimeng, Dandekar Thomas, Breitenbach Tim

机构信息

Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074 Würzburg, Germany.

出版信息

Comput Struct Biotechnol J. 2023 Jun 5;21:3293-3314. doi: 10.1016/j.csbj.2023.06.002. eCollection 2023.

DOI:10.1016/j.csbj.2023.06.002
PMID:37333862
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10276237/
Abstract

Machine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small but meaningful subset of the original genes (or feature space) facilitates human interpretability of the differences of the phenotypes including those found by machine learning results and may even turn correlations between genes and phenotypes into a causal explanation. For the feature selection task, the principal feature analysis is utilized which reduces redundant information while selecting genes that carry the information for separating the phenotypes. In this context, the presented framework shows explainability of unsupervised learning as it reveals cell-type specific signatures. Apart from a Seurat preprocessing tool and the PFA script, the pipeline uses mutual information to balance accuracy and size of the gene set if desired. A validation part to evaluate the gene selection for their information content regarding the separation of the phenotypes is provided as well, binary and multiclass classification of 3 or 4 groups are studied. Results from different single-cell data are presented. In each, only about ten out of more than 30000 genes are identified as carrying the relevant information. The code is provided in a GitHub repository at https://github.com/AC-PHD/Seurat_PFA_pipeline.

摘要

机器学习技术在分析单细胞表达数据方面表现出色。这些技术影响着从细胞注释、聚类到特征识别的所有领域。所提出的框架评估基因选择集在多大程度上能最佳地分离定义的表型或细胞组。这一创新克服了当前的局限性,能够客观、正确地识别出一小套关于分离表型的高信息含量基因集,并提供了相应的代码脚本。原始基因(或特征空间)中这个小而有意义的子集有助于人类解释表型差异,包括机器学习结果所发现的差异,甚至可能将基因与表型之间的相关性转化为因果解释。对于特征选择任务,采用了主特征分析,它在选择携带分离表型信息的基因时减少了冗余信息。在这种情况下,所提出的框架展示了无监督学习的可解释性,因为它揭示了细胞类型特异性特征。除了Seurat预处理工具和PFA脚本外,如果需要,该流程还使用互信息来平衡基因集的准确性和大小。还提供了一个验证部分,用于评估基因选择在分离表型方面的信息含量,研究了3组或4组的二元和多类分类。展示了来自不同单细胞数据的结果。在每个数据中,超过30000个基因中只有大约10个被确定为携带相关信息。代码可在https://github.com/AC-PHD/Seurat_PFA_pipeline的GitHub存储库中获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/da6302c380e6/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/1e6b26f116f5/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/c733842e3c17/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/5f1edc2c5bcd/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/47c33ceebdfe/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/80e30cd14abf/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/a0e26246a307/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/d82421d9d84a/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/f95e5f19f3fd/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/8c7e3fa1f1b4/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/6172c8524cb7/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/43a42cccc595/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/f37c3d8ea594/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/67c92850777b/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/7160dc401333/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/da6302c380e6/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/1e6b26f116f5/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/c733842e3c17/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/5f1edc2c5bcd/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/47c33ceebdfe/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/80e30cd14abf/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/a0e26246a307/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/d82421d9d84a/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/f95e5f19f3fd/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/8c7e3fa1f1b4/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/6172c8524cb7/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/43a42cccc595/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/f37c3d8ea594/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/67c92850777b/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/7160dc401333/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9cf/10276237/da6302c380e6/gr14.jpg

相似文献

1
Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning.通过结合主特征分析、互信息和机器学习从单细胞数据中揭示的优化细胞类型特征。
Comput Struct Biotechnol J. 2023 Jun 5;21:3293-3314. doi: 10.1016/j.csbj.2023.06.002. eCollection 2023.
2
An orchestra of machine learning methods reveals landmarks in single-cell data exemplified with aging fibroblasts.机器学习方法的交响乐揭示了单细胞数据中的标志性事件,以衰老成纤维细胞为例。
PLoS One. 2024 Apr 17;19(4):e0302045. doi: 10.1371/journal.pone.0302045. eCollection 2024.
3
Unsupervised Feature Selection to Identify Important ICD-10 and ATC Codes for Machine Learning on a Cohort of Patients With Coronary Heart Disease: Retrospective Study.无监督特征选择以识别冠心病患者队列机器学习中的重要国际疾病分类第十版(ICD - 10)和解剖治疗化学分类系统(ATC)编码:回顾性研究
JMIR Med Inform. 2024 Jul 26;12:e52896. doi: 10.2196/52896.
4
Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data.基于多尺度监督聚类的特征选择在肿瘤分类和基因组数据的生物标志物和靶标鉴定中的应用。
BMC Genomics. 2020 Sep 22;21(1):650. doi: 10.1186/s12864-020-07038-3.
5
Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。
Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.
6
Overall Survival Prognostic Modelling of Non-small Cell Lung Cancer Patients Using Positron Emission Tomography/Computed Tomography Harmonised Radiomics Features: The Quest for the Optimal Machine Learning Algorithm.正电子发射断层扫描/计算机断层扫描调和放射组学特征预测非小细胞肺癌患者总生存期:最优机器学习算法的探索。
Clin Oncol (R Coll Radiol). 2022 Feb;34(2):114-127. doi: 10.1016/j.clon.2021.11.014. Epub 2021 Dec 3.
7
Microbiome Preprocessing Machine Learning Pipeline.微生物组预处理机器学习管道。
Front Immunol. 2021 Jun 18;12:677870. doi: 10.3389/fimmu.2021.677870. eCollection 2021.
8
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。
Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Triku: a feature selection method based on nearest neighbors for single-cell data.Triku:一种基于最近邻的单细胞数据分析特征选择方法。
Gigascience. 2022 Mar 12;11. doi: 10.1093/gigascience/giac017.

引用本文的文献

1
gSELECT: A novel pre-analysis machine-learning library enabling early hypothesis testing and predictive gene selection in single-cell data.gSELECT:一个新型的预分析机器学习库,可在单细胞数据中进行早期假设检验和预测性基因选择。
Comput Struct Biotechnol J. 2025 Aug 5;27:3510-3527. doi: 10.1016/j.csbj.2025.07.047. eCollection 2025.
2
[Exploration of the Predictive Value of Peripheral Blood-related Indicators for EGFR 
Mutations and Prognosis in Non-small Cell Lung Cancer Using Machine Learning].[基于机器学习探索外周血相关指标对非小细胞肺癌EGFR突变及预后的预测价值]
Zhongguo Fei Ai Za Zhi. 2025 Feb 20;28(2):105-113. doi: 10.3779/j.issn.1009-3419.2025.102.05.
3

本文引用的文献

1
High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbations.细胞在 28000 个遗传和化学干扰下的高维基因表达和形态特征。
Nat Methods. 2022 Dec;19(12):1550-1557. doi: 10.1038/s41592-022-01667-0. Epub 2022 Nov 7.
2
A dominant negative ADIPOQ mutation in a diabetic family with renal disease, hypoadiponectinemia, and hyperceramidemia.一个患有肾脏疾病、低脂联素血症和高神经酰胺血症的糖尿病家族中的显性负性ADIPOQ突变。
NPJ Genom Med. 2022 Jul 22;7(1):43. doi: 10.1038/s41525-022-00314-z.
3
Optimization of synthetic molecular reporters for a mesenchymal glioblastoma transcriptional program by integer programing.
DataXflow: Synergizing data-driven modeling with best parameter fit and optimal control - An efficient data analysis for cancer research.
DataXflow:将数据驱动建模与最佳参数拟合及最优控制相结合——一种用于癌症研究的高效数据分析方法
Comput Struct Biotechnol J. 2024 Apr 8;23:1755-1772. doi: 10.1016/j.csbj.2024.04.010. eCollection 2024 Dec.
4
An orchestra of machine learning methods reveals landmarks in single-cell data exemplified with aging fibroblasts.机器学习方法的交响乐揭示了单细胞数据中的标志性事件,以衰老成纤维细胞为例。
PLoS One. 2024 Apr 17;19(4):e0302045. doi: 10.1371/journal.pone.0302045. eCollection 2024.
通过整数编程优化用于间充质神经胶质瘤转录程序的合成分子报告基因。
Bioinformatics. 2022 Sep 2;38(17):4162-4171. doi: 10.1093/bioinformatics/btac488.
4
Neuronal growth regulator 1 promotes adipocyte lipid trafficking via interaction with CD36.神经元生长调节因子 1 通过与 CD36 相互作用促进脂肪细胞脂质转运。
J Lipid Res. 2022 Jun;63(6):100221. doi: 10.1016/j.jlr.2022.100221. Epub 2022 May 6.
5
A single-cell atlas of human and mouse white adipose tissue.人类和小鼠白色脂肪组织的单细胞图谱
Nature. 2022 Mar;603(7903):926-933. doi: 10.1038/s41586-022-04518-2. Epub 2022 Mar 16.
6
Origin and Development of the Adipose Tissue, a Key Organ in Physiology and Disease.脂肪组织的起源与发育,生理学和疾病中的关键器官
Front Cell Dev Biol. 2021 Dec 21;9:786129. doi: 10.3389/fcell.2021.786129. eCollection 2021.
7
Role of lncRNA LIPE-AS1 in adipogenesis.长链非编码 RNA LIPE-AS1 在脂肪生成中的作用。
Adipocyte. 2022 Dec;11(1):11-27. doi: 10.1080/21623945.2021.2013415.
8
Deep learning enables genetic analysis of the human thoracic aorta.深度学习可用于人类胸主动脉的基因分析。
Nat Genet. 2022 Jan;54(1):40-51. doi: 10.1038/s41588-021-00962-4. Epub 2021 Nov 26.
9
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.clusterProfiler 4.0:用于解释组学数据的通用富集工具。
Innovation (Camb). 2021 Jul 1;2(3):100141. doi: 10.1016/j.xinn.2021.100141. eCollection 2021 Aug 28.
10
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.