• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从蛋白质组学或多组学数据中发现蛋白质生物标志物的特征选择方法。

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data.

机构信息

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, and Key Laboratory of Carcinogenesis and Cancer Invasion of Ministry of Education, Shanghai, China.

出版信息

Mol Cell Proteomics. 2021;20:100083. doi: 10.1016/j.mcpro.2021.100083. Epub 2021 Apr 20.

DOI:10.1016/j.mcpro.2021.100083
PMID:33887487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8165452/
Abstract

Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms.

摘要

非靶向质谱(MS)- 基于蛋白质组学为蛋白质生物标志物的发现提供了一个强大的平台,但临床转化取决于对少量蛋白质进行下游验证和确认的选择。由于典型发现研究的样本量较小,因此从发现数据中识别的蛋白质标志物可能无法推广到独立数据集。此外,使用发现平台识别的良好蛋白质标志物可能难以在验证和确认平台中实施。此外,尽管多组学特征在发现队列研究中越来越多地使用,但目前尚无用于多组学辅助蛋白质生物标志物选择的方法。在这里,我们提出了 ProMS,这是一种用于蛋白质标志物选择的计算算法。该算法基于这样的假设,即表型由少数潜在的生物学功能来特征化,每个功能由一组共表达的蛋白质来表现。应用加权 k 均值聚类算法对所有单变量信息丰富的蛋白质进行分析,以识别共表达蛋白质簇和每个簇的代表性蛋白质作为标志物。在两个具有临床重要性的分类问题中,ProMS 与现有的特征选择方法相比表现出优越的性能。ProMS 可以通过约束加权 k 均值聚类算法扩展到多组学设置(ProMS_mo),并且与 ProMS 相比,ProMS_mo 选择的蛋白质面板在独立测试数据上显示出了更好的性能。除了优越的性能外,ProMS 和 ProMS_mo 还有两个独特的优势。首先,特征聚类使所选蛋白质标志物的功能解释成为可能。其次,特征聚类提供了选择替代蛋白质标志物的机会,从而为可靠地过渡到验证和确认平台提供了机会。总之,本研究为使用蛋白质组学或多组学数据选择蛋白质生物标志物提供了一个统一有效的计算框架。软件实现可在 https://github.com/bzhanglab/proms 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/0c7e65739017/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/a6de51b60f9f/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/9eba4ea0d073/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/06d11585a981/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/fcabe11b6178/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/834950bec6b2/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/19a77f4c6c4e/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/0c7e65739017/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/a6de51b60f9f/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/9eba4ea0d073/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/06d11585a981/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/fcabe11b6178/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/834950bec6b2/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/19a77f4c6c4e/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/544d/8165452/0c7e65739017/gr6.jpg

相似文献

1
Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data.从蛋白质组学或多组学数据中发现蛋白质生物标志物的特征选择方法。
Mol Cell Proteomics. 2021;20:100083. doi: 10.1016/j.mcpro.2021.100083. Epub 2021 Apr 20.
2
Proteomic differences between hepatocellular carcinoma and nontumorous liver tissue investigated by a combined gel-based and label-free quantitative proteomics study.采用凝胶电泳结合无标记定量蛋白质组学研究方法研究肝癌与非肿瘤性肝组织的蛋白质组差异。
Mol Cell Proteomics. 2013 Jul;12(7):2006-20. doi: 10.1074/mcp.M113.028027. Epub 2013 Mar 5.
3
Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.通过多种特征选择方法从高通量数据中发现用于肝细胞癌的稳健生物标志物。
BMC Med Genomics. 2021 Aug 25;14(Suppl 1):112. doi: 10.1186/s12920-021-00957-4.
4
Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类
BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.
5
MultiAlign: a multiple LC-MS analysis tool for targeted omics analysis.MultiAlign:一种用于靶向组学分析的多重 LC-MS 分析工具。
BMC Bioinformatics. 2013 Feb 12;14:49. doi: 10.1186/1471-2105-14-49.
6
Enhanced peptide quantification using spectral count clustering and cluster abundance.使用谱计数聚类和聚类丰度进行增强的肽定量。
BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.
7
Delineating protein biomarkers for gastric cancers: A catalogue of mass spectrometry-based markers and assessment of their suitability for targeted proteomics applications.鉴定胃癌的蛋白质生物标志物:基于质谱的标志物目录及其在靶向蛋白质组学应用中的适用性评估。
J Proteomics. 2024 Aug 30;306:105262. doi: 10.1016/j.jprot.2024.105262. Epub 2024 Jul 22.
8
Mass spectrometry based biomarker discovery, verification, and validation--quality assurance and control of protein biomarker assays.基于质谱的生物标志物发现、验证及确认——蛋白质生物标志物检测的质量保证与控制
Mol Oncol. 2014 Jun;8(4):840-58. doi: 10.1016/j.molonc.2014.03.006. Epub 2014 Mar 20.
9
Proteomics for discovery of candidate colorectal cancer biomarkers.用于发现结直肠癌候选生物标志物的蛋白质组学
World J Gastroenterol. 2014 Apr 14;20(14):3804-24. doi: 10.3748/wjg.v20.i14.3804.
10
INDEED: Integrated differential expression and differential network analysis of omic data for biomarker discovery.确实:用于生物标志物发现的组学数据的综合差异表达和差异网络分析。
Methods. 2016 Dec 1;111:12-20. doi: 10.1016/j.ymeth.2016.08.015. Epub 2016 Aug 31.

引用本文的文献

1
MultiOmicsAgent: Guided Extreme Gradient-Boosted Decision Trees-Based Approaches for Biomarker-Candidate Discovery in Multiomics Data.多组学智能体:基于引导式极限梯度提升决策树的多组学数据生物标志物候选发现方法
J Proteome Res. 2025 Jun 6;24(6):2816-2831. doi: 10.1021/acs.jproteome.4c01066. Epub 2025 May 25.
2
Development of a urine-based metabolomics approach for multi-cancer screening and tumor origin prediction.一种用于多癌筛查和肿瘤起源预测的基于尿液的代谢组学方法的开发。
Front Immunol. 2024 Dec 13;15:1449103. doi: 10.3389/fimmu.2024.1449103. eCollection 2024.
3
Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis.

本文引用的文献

1
Integrated Proteogenomic Characterization of HBV-Related Hepatocellular Carcinoma.HBV 相关肝细胞癌的综合蛋白质基因组特征分析。
Cell. 2019 Oct 3;179(2):561-577.e22. doi: 10.1016/j.cell.2019.08.052.
2
Proteomics Is Not an Island: Multi-omics Integration Is the Key to Understanding Biological Systems.蛋白质组学并非孤立存在:多组学整合是理解生物系统的关键。
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S1-S4. doi: 10.1074/mcp.E119.001693.
3
WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs.WebGestalt 2019:基因集分析工具包,具有全新的用户界面和 API。
蛋白质组学与机器学习:在骨骼肌组织荟萃分析中利用领域知识进行特征选择
Heliyon. 2024 Nov 29;10(24):e40772. doi: 10.1016/j.heliyon.2024.e40772. eCollection 2024 Dec 30.
4
Patient-Derived Xenografts of Triple-Negative Breast Cancer Enable Deconvolution and Prediction of Chemotherapy Responses.三阴性乳腺癌患者来源的异种移植模型可实现化疗反应的反卷积分析和预测
bioRxiv. 2025 Jan 8:2024.12.09.627518. doi: 10.1101/2024.12.09.627518.
5
DEWNA: dynamic entropy weight network analysis and its application to the DNA-binding proteome in A549 cells with cisplatin-induced damage.德瓦纳:动态熵权网络分析及其在顺铂诱导损伤的 A549 细胞 DNA 结合蛋白质组中的应用。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae564.
6
Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis.整合蛋白质组学与可解释人工智能:子宫内膜癌诊断和预后蛋白质生物标志物的综合分析
Front Mol Biosci. 2024 Jun 3;11:1389325. doi: 10.3389/fmolb.2024.1389325. eCollection 2024.
7
Differentiation between descending thoracic aortic diseases using machine learning and plasma proteomic signatures.利用机器学习和血浆蛋白质组学特征鉴别胸降主动脉疾病
Clin Proteomics. 2024 Jun 2;21(1):38. doi: 10.1186/s12014-024-09487-4.
8
Staging of colorectal cancer using lipid biomarkers and machine learning.使用脂质生物标志物和机器学习对结直肠癌进行分期。
Metabolomics. 2023 Sep 20;19(10):84. doi: 10.1007/s11306-023-02049-z.
9
miRDM-rfGA: Genetic algorithm-based identification of a miRNA set for detecting type 2 diabetes.miRDM-rfGA:基于遗传算法的 miRNA 集识别用于检测 2 型糖尿病。
BMC Med Genomics. 2023 Aug 22;16(1):195. doi: 10.1186/s12920-023-01636-2.
10
Flow Cytometry: The Next Revolution.流式细胞术:下一次革命。
Cells. 2023 Jul 17;12(14):1875. doi: 10.3390/cells12141875.
Nucleic Acids Res. 2019 Jul 2;47(W1):W199-W205. doi: 10.1093/nar/gkz401.
4
Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities.人类结肠癌的蛋白质基因组分析揭示了新的治疗机会。
Cell. 2019 May 2;177(4):1035-1049.e19. doi: 10.1016/j.cell.2019.03.030. Epub 2019 Apr 25.
5
Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma.蛋白质组学鉴定早期肝细胞癌的新治疗靶点。
Nature. 2019 Mar;567(7747):257-261. doi: 10.1038/s41586-019-0987-8. Epub 2019 Feb 27.
6
LPCAT1 promotes brain metastasis of lung adenocarcinoma by up-regulating PI3K/AKT/MYC pathway.LPCAT1 通过上调 PI3K/AKT/MYC 通路促进肺腺癌脑转移。
J Exp Clin Cancer Res. 2019 Feb 21;38(1):95. doi: 10.1186/s13046-019-1092-4.
7
Clinical potential of mass spectrometry-based proteogenomics.基于质谱的蛋白质基因组学的临床潜力。
Nat Rev Clin Oncol. 2019 Apr;16(4):256-268. doi: 10.1038/s41571-018-0135-7.
8
The PRIDE database and related tools and resources in 2019: improving support for quantification data.PRIDE 数据库及相关工具和资源在 2019 年的进展:提高定量数据支持。
Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450. doi: 10.1093/nar/gky1106.
9
Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data.基于多组学数据预测卵巢癌生存的最小冗余最大相关性多视图特征选择。
BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.
10
Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography-mass spectrometry.通过液相色谱-质谱联用技术对肿瘤组织进行多重深度蛋白质组和磷酸化蛋白质组分析的可重现工作流程。
Nat Protoc. 2018 Jul;13(7):1632-1661. doi: 10.1038/s41596-018-0006-9.