Suppr超能文献

血浆:用于多组学分析的偏最小二乘法

PLASMA: Partial LeAst Squares for Multiomics Analysis.

作者信息

Yamaguchi Kyoko, Abdelbaky Salma, Yu Lianbo, Oakes Christopher C, Abruzzo Lynne V, Coombes Kevin R

机构信息

Division of Hematology, Department of Internal Medicine, Ohio State University, Columbus, OH 43210, USA.

Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210, USA.

出版信息

Cancers (Basel). 2025 Jan 17;17(2):287. doi: 10.3390/cancers17020287.

Abstract

: Recent growth in the number and applications of high-throughput "omics" technologies has created a need for better methods to integrate multiomics data. Much progress has been made in developing unsupervised methods, but supervised methods have lagged behind. : Here we present the first algorithm, PLASMA, that can learn to predict time-to-event outcomes from multiomics data sets, even when some samples have only been assayed on a subset of the omics data sets. PLASMA uses two layers of existing partial least squares algorithms to first select components that covary with the outcome and then construct a joint Cox proportional hazards model. : We apply PLASMA to the stomach adenocarcinoma (STAD) data from The Cancer Genome Atlas. We validate the model both by splitting the STAD data into training and test sets and by applying them to the subset of esophageal cancer (ESCA) containing adenocarcinomas. We use the other half of the ESCA data, which contains squamous cell carcinomas dissimilar to STAD, as a negative comparison. Our model successfully separates both the STAD test set ( = 2.73 × 10) and the independent ESCA adenocarcinoma data ( = 0.025) into high-risk and low-risk patients. It does not separate the negative comparison data set (ESCA squamous cell carcinomas, = 0.57). The performance of the unified multiomics model is superior to that of individually trained models and is also superior to an unsupervised method (Multi-Omics Factor Analysis; MOFA), which finds latent factors to be used as putative predictors in a post hoc survival analysis. : Many of the factors that contribute strongly to the PLASMA model can be justified from the biological literature.

摘要

高通量“组学”技术在数量和应用方面的最新增长,使得人们需要更好的方法来整合多组学数据。在开发无监督方法方面已经取得了很大进展,但监督方法却滞后了。

在这里,我们提出了第一种算法PLASMA,它能够从多组学数据集中学习预测事件发生时间的结果,即使某些样本仅在部分组学数据集上进行了分析。PLASMA使用两层现有的偏最小二乘算法,首先选择与结果协变的成分,然后构建联合Cox比例风险模型。

我们将PLASMA应用于来自癌症基因组图谱的胃腺癌(STAD)数据。我们通过将STAD数据拆分为训练集和测试集,并将其应用于包含腺癌的食管癌(ESCA)子集来验证模型。我们将ESCA数据的另一半(包含与STAD不同的鳞状细胞癌)用作阴性对照。我们的模型成功地将STAD测试集( = 2.73 × 10)和独立的ESCA腺癌数据( = 0.025)分为高风险和低风险患者。它没有将阴性对照数据集(ESCA鳞状细胞癌, = 0.57)分开。统一多组学模型的性能优于单独训练的模型,也优于一种无监督方法(多组学因子分析;MOFA),该方法找到潜在因子,以便在事后生存分析中用作假定的预测因子。

许多对PLASMA模型有强烈贡献的因素都可以从生物学文献中得到解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec17/11763701/8bf5ef616120/cancers-17-00287-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验