• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VSOLassoBag:一种面向变量选择的LASSO套袋算法,用于基于组学的转化研究中的生物标志物发现。

VSOLassoBag: a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research.

作者信息

Liang Jiaqi, Wang Chaoye, Zhang Di, Xie Yubin, Zeng Yanru, Li Tianqin, Zuo Zhixiang, Ren Jian, Zhao Qi

机构信息

State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China; State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.

State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.

出版信息

J Genet Genomics. 2023 Mar;50(3):151-162. doi: 10.1016/j.jgg.2022.12.005. Epub 2023 Jan 3.

DOI:10.1016/j.jgg.2022.12.005
PMID:36608930
Abstract

Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research. With its advantages in both feature shrinkage and biological interpretability, Least Absolute Shrinkage and Selection Operator (LASSO) algorithm is one of the most popular methods for the scenarios of clinical biomarker development. However, in practice, applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables, leading to the overfitting of the model. Here, we present VSOLassoBag, a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data. Using a bagging strategy in combination with a parametric method or inflection point search method, VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates. The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction. In addition, by comparing with multiple existing algorithms, VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others. In summary, VSOLassoBag, which is available at https://seqworld.com/VSOLassoBag/ under the GPL v3 license, provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data. For user's convenience, we implement VSOLassoBag as an R package that provides multithreading computing configurations.

摘要

从高维生物数据中筛选生物分子标志物是生物医学转化研究长期以来的任务之一。由于其在特征收缩和生物学可解释性方面的优势,最小绝对收缩和选择算子(LASSO)算法是临床生物标志物开发场景中最受欢迎的方法之一。然而,在实际应用中,将LASSO应用于高维低样本量的组学数据时,通常会导致预测变量过多,从而导致模型过度拟合。在此,我们提出了VSOLassoBag,一种通过集成集成学习策略的包装LASSO方法,以帮助从组学数据中高置信度地选择高效且稳定的变量。VSOLassoBag使用装袋策略结合参数方法或拐点搜索方法,可以对多个LASSO模型生成的变量进行集成和投票,以确定最佳候选变量。VSOLassoBag在模拟数据集和真实世界数据集上的应用表明,该算法可以有效地识别用于病例对照二元分类或预后预测的标志物。此外,通过与多种现有算法进行比较,VSOLassoBag在不同场景下表现出可比的性能,同时产生的特征比其他算法更少。总之,VSOLassoBag可在https://seqworld.com/VSOLassoBag/上以GPL v3许可获得,为从高维组学数据中选择可靠的生物标志物提供了一种替代策略。为方便用户,我们将VSOLassoBag实现为一个提供多线程计算配置的R包。

相似文献

1
VSOLassoBag: a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research.VSOLassoBag:一种面向变量选择的LASSO套袋算法,用于基于组学的转化研究中的生物标志物发现。
J Genet Genomics. 2023 Mar;50(3):151-162. doi: 10.1016/j.jgg.2022.12.005. Epub 2023 Jan 3.
2
-Omics biomarker identification pipeline for translational medicine.组学生物标志物鉴定在转化医学中的应用
J Transl Med. 2019 May 14;17(1):155. doi: 10.1186/s12967-019-1912-5.
3
Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application.用于识别超重和肥胖血清生物标志物的最小绝对收缩和选择算子类型方法:模拟与应用
BMC Med Res Methodol. 2016 Nov 14;16(1):154. doi: 10.1186/s12874-016-0254-8.
4
Combined Performance of Screening and Variable Selection Methods in Ultra-High Dimensional Data in Predicting Time-To-Event Outcomes.超高维数据中筛选和变量选择方法在预测事件发生时间结局方面的综合性能
Diagn Progn Res. 2018;2. doi: 10.1186/s41512-018-0043-4. Epub 2018 Sep 26.
5
TSPLASSO: A Two-stage Prior LASSO Algorithm for Gene Selection using Omics Data.TSPLASSO:一种使用组学数据进行基因选择的两阶段先验LASSO算法。
IEEE J Biomed Health Inform. 2023 Oct 23;PP. doi: 10.1109/JBHI.2023.3326485.
6
Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.用于临床预测的稳定特征选择:利用树套索法挖掘国际疾病分类树结构
J Biomed Inform. 2015 Feb;53:277-90. doi: 10.1016/j.jbi.2014.11.013. Epub 2014 Dec 9.
7
Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.利用符号回归识别组学数据中的相互作用,以发现临床生物标志物。
Bioinformatics. 2022 Aug 2;38(15):3749-3758. doi: 10.1093/bioinformatics/btac405.
8
Stable Iterative Variable Selection.稳定迭代变量选择。
Bioinformatics. 2021 Dec 11;37(24):4810-4817. doi: 10.1093/bioinformatics/btab501.
9
Improved NSGA-II algorithms for multi-objective biomarker discovery.改进的 NSGA-II 算法用于多目标生物标志物发现。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii20-ii26. doi: 10.1093/bioinformatics/btac463.
10
Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis.基于 Lasso 组合分析的特征相关性评分及其在淋巴瘤诊断中的应用。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S14. doi: 10.1186/1471-2164-14-S1-S14. Epub 2013 Jan 21.

引用本文的文献

1
Integrated metabolomic-lipidomic profiling reveals novel biomarkers and therapeutic targets for alcohol use disorder with cognitive impairment.整合代谢组学-脂质组学分析揭示了酒精使用障碍伴认知障碍的新型生物标志物和治疗靶点。
Front Psychiatry. 2025 Jun 13;16:1594313. doi: 10.3389/fpsyt.2025.1594313. eCollection 2025.
2
Predicting pathological response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer with two step feature selection and ensemble learning.利用两步特征选择和集成学习预测局部晚期直肠癌新辅助放化疗的病理反应
Sci Rep. 2025 Mar 22;15(1):9936. doi: 10.1038/s41598-025-94337-y.
3
Proteomic and serological markers for diagnosing cardia gastric cancer and precursor lesions in a Chinese population.
用于诊断中国人群贲门胃癌及癌前病变的蛋白质组学和血清学标志物。
Sci Rep. 2024 Oct 25;14(1):25309. doi: 10.1038/s41598-024-75912-1.
4
Repeated Sieving for Prediction Model Building with High-Dimensional Data.用于高维数据预测模型构建的重复筛选
J Pers Med. 2024 Jul 19;14(7):769. doi: 10.3390/jpm14070769.
5
Metabolomics signatures of sweetened beverages and added sugar are related to anthropometric measures of adiposity in young individuals: results from a cohort study.甜味饮料和添加糖的代谢组学特征与年轻人肥胖的人体测量指标有关:一项队列研究的结果。
Am J Clin Nutr. 2024 Oct;120(4):879-890. doi: 10.1016/j.ajcnut.2024.07.021. Epub 2024 Jul 24.
6
Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data.过滤器和包装器堆叠集成 (FWSE):一种在高维组学数据中可靠发现生物标志物的稳健方法。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad382.