Suppr超能文献

VSOLassoBag:一种面向变量选择的LASSO套袋算法,用于基于组学的转化研究中的生物标志物发现。

VSOLassoBag: a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research.

作者信息

Liang Jiaqi, Wang Chaoye, Zhang Di, Xie Yubin, Zeng Yanru, Li Tianqin, Zuo Zhixiang, Ren Jian, Zhao Qi

机构信息

State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China; State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.

State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.

出版信息

J Genet Genomics. 2023 Mar;50(3):151-162. doi: 10.1016/j.jgg.2022.12.005. Epub 2023 Jan 3.

Abstract

Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research. With its advantages in both feature shrinkage and biological interpretability, Least Absolute Shrinkage and Selection Operator (LASSO) algorithm is one of the most popular methods for the scenarios of clinical biomarker development. However, in practice, applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables, leading to the overfitting of the model. Here, we present VSOLassoBag, a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data. Using a bagging strategy in combination with a parametric method or inflection point search method, VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates. The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction. In addition, by comparing with multiple existing algorithms, VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others. In summary, VSOLassoBag, which is available at https://seqworld.com/VSOLassoBag/ under the GPL v3 license, provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data. For user's convenience, we implement VSOLassoBag as an R package that provides multithreading computing configurations.

摘要

从高维生物数据中筛选生物分子标志物是生物医学转化研究长期以来的任务之一。由于其在特征收缩和生物学可解释性方面的优势,最小绝对收缩和选择算子(LASSO)算法是临床生物标志物开发场景中最受欢迎的方法之一。然而,在实际应用中,将LASSO应用于高维低样本量的组学数据时,通常会导致预测变量过多,从而导致模型过度拟合。在此,我们提出了VSOLassoBag,一种通过集成集成学习策略的包装LASSO方法,以帮助从组学数据中高置信度地选择高效且稳定的变量。VSOLassoBag使用装袋策略结合参数方法或拐点搜索方法,可以对多个LASSO模型生成的变量进行集成和投票,以确定最佳候选变量。VSOLassoBag在模拟数据集和真实世界数据集上的应用表明,该算法可以有效地识别用于病例对照二元分类或预后预测的标志物。此外,通过与多种现有算法进行比较,VSOLassoBag在不同场景下表现出可比的性能,同时产生的特征比其他算法更少。总之,VSOLassoBag可在https://seqworld.com/VSOLassoBag/上以GPL v3许可获得,为从高维组学数据中选择可靠的生物标志物提供了一种替代策略。为方便用户,我们将VSOLassoBag实现为一个提供多线程计算配置的R包。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验