• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于疾病特异性亚型发现的保持异质性的判别特征选择

Heterogeneity-preserving discriminative feature selection for disease-specific subtype discovery.

作者信息

M A Basher Abdur Rahman, Hallinan Caleb, Lee Kwonmoo

机构信息

Vascular Biology Program, Boston Children's Hospital, Boston, MA, USA.

Department of Surgery, Harvard Medical School, Boston, MA, USA.

出版信息

Nat Commun. 2025 Apr 16;16(1):3593. doi: 10.1038/s41467-025-58718-1.

DOI:10.1038/s41467-025-58718-1
PMID:40234411
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12000357/
Abstract

Disease-specific subtype identification can deepen our understanding of disease progression and pave the way for personalized therapies, given the complexity of disease heterogeneity. Large-scale transcriptomic, proteomic, and imaging datasets create opportunities for discovering subtypes but also pose challenges due to their high dimensionality. To mitigate this, many feature selection methods focus on selecting features that distinguish known diseases or cell states, yet often miss features that preserve heterogeneity and reveal new subtypes. To overcome this gap, we develop Preserving Heterogeneity (PHet), a statistical methodology that employs iterative subsampling and differential analysis of interquartile range, in conjunction with Fisher's method, to identify a small set of features that enhance subtype clustering quality. Here, we show that this method can maintain sample heterogeneity while distinguishing known disease/cell states, with a tendency to outperform previous differential expression and outlier-based methods, indicating its potential to advance our understanding of disease mechanisms and cell differentiation.

摘要

鉴于疾病异质性的复杂性,特定疾病亚型的识别可以加深我们对疾病进展的理解,并为个性化治疗铺平道路。大规模的转录组学、蛋白质组学和成像数据集为发现亚型创造了机会,但由于其高维度性也带来了挑战。为了缓解这一问题,许多特征选择方法专注于选择能够区分已知疾病或细胞状态的特征,但往往会错过保留异质性并揭示新亚型的特征。为了克服这一差距,我们开发了保留异质性(PHet)方法,这是一种统计方法,它采用迭代子采样和四分位间距的差异分析,并结合费舍尔方法,来识别一小部分能够提高亚型聚类质量的特征。在这里,我们表明该方法能够在区分已知疾病/细胞状态的同时保持样本异质性,并且往往优于以前基于差异表达和异常值的方法,这表明它有潜力推进我们对疾病机制和细胞分化的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/78ca3fa140da/41467_2025_58718_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/5ea3776c02f6/41467_2025_58718_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/8fd5b4d5fde0/41467_2025_58718_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/afc29f7b0429/41467_2025_58718_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/5801267cdf9d/41467_2025_58718_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/94d330c25bda/41467_2025_58718_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/f2cb100ffa28/41467_2025_58718_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/78ca3fa140da/41467_2025_58718_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/5ea3776c02f6/41467_2025_58718_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/8fd5b4d5fde0/41467_2025_58718_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/afc29f7b0429/41467_2025_58718_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/5801267cdf9d/41467_2025_58718_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/94d330c25bda/41467_2025_58718_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/f2cb100ffa28/41467_2025_58718_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12000357/78ca3fa140da/41467_2025_58718_Fig7_HTML.jpg

相似文献

1
Heterogeneity-preserving discriminative feature selection for disease-specific subtype discovery.用于疾病特异性亚型发现的保持异质性的判别特征选择
Nat Commun. 2025 Apr 16;16(1):3593. doi: 10.1038/s41467-025-58718-1.
2
Heterogeneity-Preserving Discriminative Feature Selection for Disease-Specific Subtype Discovery.用于疾病特异性亚型发现的保持异质性的判别特征选择
bioRxiv. 2025 Mar 5:2023.05.14.540686. doi: 10.1101/2023.05.14.540686.
3
Identification of expression patterns in the progression of disease stages by integration of transcriptomic data.通过整合转录组数据识别疾病阶段进展中的表达模式。
BMC Bioinformatics. 2016 Nov 22;17(Suppl 15):432. doi: 10.1186/s12859-016-1290-4.
4
Recursive Consensus Clustering for novel subtype discovery from transcriptome data.基于转录组数据的新型亚型发现的递归共识聚类。
Sci Rep. 2020 Jul 3;10(1):11005. doi: 10.1038/s41598-020-67016-3.
5
Dissecting cancer heterogeneity based on dimension reduction of transcriptomic profiles using extreme learning machines.基于转录组谱的降维使用极限学习机剖析癌症异质性。
PLoS One. 2018 Sep 14;13(9):e0203824. doi: 10.1371/journal.pone.0203824. eCollection 2018.
6
A consensus multi-view multi-objective gene selection approach for improved sample classification.一种共识多视角多目标基因选择方法,用于提高样本分类。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):386. doi: 10.1186/s12859-020-03681-5.
7
Selecting single cell clustering parameter values using subsampling-based robustness metrics.使用基于子采样的稳健性指标选择单细胞聚类参数值。
BMC Bioinformatics. 2021 Feb 1;22(1):39. doi: 10.1186/s12859-021-03957-4.
8
GLassonet: Identifying Discriminative Gene Sets Among Molecular Subtypes of Breast Cancer.Glassonet:在乳腺癌的分子亚型中识别有区别的基因集。
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):1905-1916. doi: 10.1109/TCBB.2022.3220623. Epub 2023 Jun 5.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma.一种强大的用于配对样本判别分析和特征选择的工具,影响了对将肺组织重编程为腺癌所必需的基因的识别。
BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S24. doi: 10.1186/1471-2164-12-S3-S24.

引用本文的文献

1
Diffraction-informed deep learning for molecular-specific holograms of breast cancer cells.用于乳腺癌细胞分子特异性全息图的衍射信息深度学习。
APL Bioeng. 2025 Jul 23;9(3):036107. doi: 10.1063/5.0246495. eCollection 2025 Sep.

本文引用的文献

1
Interpretable Fine-Grained Phenotypes of Subcellular Dynamics via Unsupervised Deep Learning.基于无监督深度学习的亚细胞动力学可解释精细粒度表型。
Adv Sci (Weinh). 2024 Nov;11(41):e2403547. doi: 10.1002/advs.202403547. Epub 2024 Sep 6.
2
A comparison of marker gene selection methods for single-cell RNA sequencing data.单细胞 RNA 测序数据中标记基因选择方法的比较。
Genome Biol. 2024 Feb 26;25(1):56. doi: 10.1186/s13059-024-03183-0.
3
Differentially expressed discriminative genes and significant meta-hub genes based key genes identification for hepatocellular carcinoma using statistical machine learning.
基于统计机器学习的肝细胞癌差异表达鉴别基因和关键基因的显著元枢纽基因鉴定。
Sci Rep. 2023 Mar 7;13(1):3771. doi: 10.1038/s41598-023-30851-1.
4
BPIFA1 is a secreted biomarker of differentiating human airway epithelium.BPIFA1 是一种人呼吸道上皮细胞分化的分泌性生物标志物。
Front Cell Infect Microbiol. 2022 Nov 28;12:1035566. doi: 10.3389/fcimb.2022.1035566. eCollection 2022.
5
Benchmark study of feature selection strategies for multi-omics data.基于多组学数据的特征选择策略基准研究。
BMC Bioinformatics. 2022 Oct 5;23(1):412. doi: 10.1186/s12859-022-04962-x.
6
Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics.通过整合单细胞 RNA 测序和人类遗传学来鉴定疾病关键细胞类型和细胞过程。
Nat Genet. 2022 Oct;54(10):1479-1492. doi: 10.1038/s41588-022-01187-9. Epub 2022 Sep 29.
7
What is a cell type and how to define it?什么是细胞类型,如何定义它?
Cell. 2022 Jul 21;185(15):2739-2755. doi: 10.1016/j.cell.2022.06.031.
8
Identifying tumor cells at the single-cell level using machine learning.利用机器学习在单细胞水平上识别肿瘤细胞。
Genome Biol. 2022 May 30;23(1):123. doi: 10.1186/s13059-022-02683-1.
9
MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering.马可波罗法:一种无需依赖于先前聚类即可在单细胞 RNA-seq 数据中发现差异表达基因的方法。
Nucleic Acids Res. 2022 Jul 8;50(12):e71. doi: 10.1093/nar/gkac216.
10
Exaggerated false positives by popular differential expression methods when analyzing human population samples.分析人类群体样本时,常用差异表达方法会导致假阳性结果夸大。
Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.