• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于人类蛋白质组微阵列的随机梯度增强方法发现用于肺癌分类的潜在生物标志物。

Discovery of potential biomarkers for lung cancer classification based on human proteome microarrays using Stochastic Gradient Boosting approach.

机构信息

Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China.

Chongqing Center for Disease Control and Prevention, No.8 Changjiang 2nd Street, Yuzhong District, Chongqing, 400042, China.

出版信息

J Cancer Res Clin Oncol. 2023 Aug;149(10):6803-6812. doi: 10.1007/s00432-023-04643-z. Epub 2023 Feb 18.

DOI:10.1007/s00432-023-04643-z
PMID:36807761
Abstract

PURPOSE

Early identification of lung cancer (LC) will considerably facilitate the intervention and prevention of LC. The human proteome micro-arrays approach can be used as a "liquid biopsy" to diagnose LC to complement conventional diagnosis, which needs advanced bioinformatics methods such as feature selection (FS) and refined machine learning models.

METHODS

A two-stage FS methodology by infusing Pearson's Correlation (PC) with a univariate filter (SBF) or recursive feature elimination (RFE) was used to reduce the redundancy of the original dataset. The Stochastic Gradient Boosting (SGB), Random Forest (RF), and Support Vector Machine (SVM) techniques were applied to build ensemble classifiers based on four subsets. The synthetic minority oversampling technique (SMOTE) was used in the preprocessing of imbalanced data.

RESULTS

FS approach with SBF and RFE extracted 25 and 55 features, respectively, with 14 overlapped ones. All three ensemble models demonstrate superior accuracy (ranging from 0.867 to 0.967) and sensitivity (0.917 to 1.00) in the test datasets with SGB of SBF subset outperforming others. The SMOTE technique has improved the model performance in the training process. Three of the top selected candidate biomarkers (LGR4, CDC34, and GHRHR) were highly suggested to play a role in lung tumorigenesis.

CONCLUSION

A novel hybrid FS method with classical ensemble machine learning algorithms was first used in the classification of protein microarray data. The parsimony model constructed by the SGB algorithm with the appropriate FS and SMOTE approach performs well in the classification task with higher sensitivity and specificity. Standardization and innovation of bioinformatics approach for protein microarray analysis need further exploration and validation.

摘要

目的

早期发现肺癌(LC)将极大地促进 LC 的干预和预防。人类蛋白质组微阵列方法可用作“液体活检”来诊断 LC,以补充传统诊断,传统诊断需要先进的生物信息学方法,如特征选择(FS)和精炼机器学习模型。

方法

采用两阶段 FS 方法,通过将 Pearson 相关(PC)与单变量滤波器(SBF)或递归特征消除(RFE)相结合,减少原始数据集的冗余。基于四个子集,应用随机梯度提升(SGB)、随机森林(RF)和支持向量机(SVM)技术构建集成分类器。在不平衡数据的预处理中使用合成少数过采样技术(SMOTE)。

结果

SBF 和 RFE 的 FS 方法分别提取了 25 和 55 个特征,其中有 14 个重叠。所有三个集成模型在测试数据集上均表现出较高的准确性(范围为 0.867 至 0.967)和敏感性(0.917 至 1.00),其中 SBF 子集的 SGB 表现优于其他模型。SMOTE 技术提高了模型在训练过程中的性能。三个顶级候选生物标志物(LGR4、CDC34 和 GHRHR)被高度建议在肺肿瘤发生中发挥作用。

结论

首次将新型混合 FS 方法与经典集成机器学习算法用于蛋白质微阵列数据的分类。使用 SGB 算法和适当的 FS 和 SMOTE 方法构建的简约模型在分类任务中表现良好,具有较高的敏感性和特异性。蛋白质微阵列分析的生物信息学方法的标准化和创新需要进一步探索和验证。

相似文献

1
Discovery of potential biomarkers for lung cancer classification based on human proteome microarrays using Stochastic Gradient Boosting approach.基于人类蛋白质组微阵列的随机梯度增强方法发现用于肺癌分类的潜在生物标志物。
J Cancer Res Clin Oncol. 2023 Aug;149(10):6803-6812. doi: 10.1007/s00432-023-04643-z. Epub 2023 Feb 18.
2
Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features.基于远程监测语音特征的帕金森病分类堆叠集成学习
Diagnostics (Basel). 2025 Jun 9;15(12):1467. doi: 10.3390/diagnostics15121467.
3
Machine learning-based radiomics for differentiating lung cancer subtypes in brain metastases using CE-T1WI.基于机器学习的影像组学在使用对比增强T1加权成像鉴别脑转移瘤中肺癌亚型的应用
Front Oncol. 2025 Jun 19;15:1599882. doi: 10.3389/fonc.2025.1599882. eCollection 2025.
4
Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案:算法开发与验证
JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
XGB-BIF: An XGBoost-Driven Biomarker Identification Framework for Detecting Cancer Using Human Genomic Data.XGB-BIF:一种用于利用人类基因组数据检测癌症的基于XGBoost的生物标志物识别框架。
Int J Mol Sci. 2025 Jun 11;26(12):5590. doi: 10.3390/ijms26125590.
7
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
8
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
9
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
10
Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models.用于乳腺癌诊断的基于血清代谢组学的可解释机器学习:多目标特征选择驱动的LightGBM-SHAP模型的见解
Medicina (Kaunas). 2025 Jun 19;61(6):1112. doi: 10.3390/medicina61061112.

本文引用的文献

1
Graph-based relevancy-redundancy gene selection method for cancer diagnosis.基于图的相关性-冗余基因选择方法用于癌症诊断。
Comput Biol Med. 2022 Aug;147:105766. doi: 10.1016/j.compbiomed.2022.105766. Epub 2022 Jun 27.
2
Performance-weighted-voting model: An ensemble machine learning method for cancer type classification using whole-exome sequencing mutation.性能加权投票模型:一种使用全外显子组测序突变进行癌症类型分类的集成机器学习方法。
Quant Biol. 2020 Dec 24;8(4):347-358. doi: 10.1007/s40484-020-0226-1. Epub 2020 Dec 7.
3
Screening for Lung Cancer With Low-Dose Computed Tomography: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force.
用低剂量计算机断层扫描进行肺癌筛查:美国预防服务工作组的更新证据报告和系统评价。
JAMA. 2021 Mar 9;325(10):971-987. doi: 10.1001/jama.2021.0377.
4
Growth Hormone-Releasing Hormone in Lung Physiology and Pulmonary Disease.生长激素释放激素在肺生理学和肺部疾病中的作用。
Cells. 2020 Oct 21;9(10):2331. doi: 10.3390/cells9102331.
5
Identification and validation of the prognostic value of immune-related genes in non-small cell lung cancer.非小细胞肺癌中免疫相关基因预后价值的鉴定与验证
Am J Transl Res. 2020 Sep 15;12(9):5844-5865. eCollection 2020.
6
G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays.G-Forest:一种用于基因表达微阵列中成本敏感特征选择的集成方法。
Artif Intell Med. 2020 Aug;108:101941. doi: 10.1016/j.artmed.2020.101941. Epub 2020 Aug 14.
7
Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification.基于机器学习的循环miRNA集成递归特征选择用于癌症肿瘤分类
Cancers (Basel). 2020 Jul 3;12(7):1785. doi: 10.3390/cancers12071785.
8
Autoantibody signature in hepatocellular carcinoma using seromics.基于血清蛋白质组学的肝细胞癌自身抗体特征
J Hematol Oncol. 2020 Jul 2;13(1):85. doi: 10.1186/s13045-020-00918-x.
9
Analysis of expression differences of immune genes in non-small cell lung cancer based on TCGA and ImmPort data sets and the application of a prognostic model.基于TCGA和ImmPort数据集的非小细胞肺癌免疫基因表达差异分析及预后模型的应用
Ann Transl Med. 2020 Apr;8(8):550. doi: 10.21037/atm.2020.04.38.
10
Targeting CDC34 E2 ubiquitin conjugating enzyme for lung cancer therapy.靶向CDC34 E2泛素结合酶用于肺癌治疗。
EBioMedicine. 2020 Apr;54:102718. doi: 10.1016/j.ebiom.2020.102718. Epub 2020 Apr 5.