• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于转录组数据模式发现的充分主成分回归

Sufficient principal component regression for pattern discovery in transcriptomic data.

作者信息

Ding Lei, Zentner Gabriel E, McDonald Daniel J

机构信息

Department of Statistics, Indiana University, Bloomington, IN 47405, USA.

Department of Biology, Indiana University, Bloomington, IN 47405, USA.

出版信息

Bioinform Adv. 2022 May 14;2(1):vbac033. doi: 10.1093/bioadv/vbac033. eCollection 2022.

DOI:10.1093/bioadv/vbac033
PMID:35722206
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9194947/
Abstract

MOTIVATION

Methods for the global measurement of transcript abundance such as microarrays and RNA-Seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives or ignore any unknown grouping structures for the features.

RESULTS

We propose a method called SuffPCR that yields improved predictions in high-dimensional tasks including regression and classification, especially in the typical context of omics with correlated features. SuffPCR first estimates sparse principal components and then estimates a linear model on the recovered subspace. Because the estimated subspace is sparse in the features, the resulting predictions will depend on only a small subset of genes. SuffPCR works well on a variety of simulated and experimental transcriptomic data, performing nearly optimally when the model assumptions are satisfied. We also demonstrate near-optimal theoretical guarantees.

AVAILABILITY AND IMPLEMENTATION

Code and raw data are freely available at https://github.com/dajmcdon/suffpcr. Package documentation may be viewed at https://dajmcdon.github.io/suffpcr.

CONTACT

daniel@stat.ubc.ca.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

诸如微阵列和RNA测序等用于全局测量转录本丰度的方法会生成数据集,其中测量特征的数量远远超过观测值的数量。因此,从这些数据中提取具有生物学意义且实验上易于处理的见解需要进行高维预测。现有的针对这一挑战的稀疏线性方法已经取得了惊人的成功,但仍存在一些重要问题。这些方法可能无法选择正确的特征,相对于非稀疏方法预测效果较差,或者忽略特征的任何未知分组结构。

结果

我们提出了一种名为SuffPCR的方法,该方法在包括回归和分类在内的高维任务中能产生更好的预测,特别是在具有相关特征的组学典型背景下。SuffPCR首先估计稀疏主成分,然后在恢复的子空间上估计线性模型。由于估计的子空间在特征上是稀疏的,因此得到的预测将仅取决于一小部分基因。SuffPCR在各种模拟和实验转录组数据上表现良好,当满足模型假设时,其性能几乎达到最优。我们还展示了近乎最优的理论保证。

可用性和实现

代码和原始数据可在https://github.com/dajmcdon/suffpcr免费获取。包文档可在https://dajmcdon.github.io/suffpcr查看。

联系方式

daniel@stat.ubc.ca。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/20a26c14559a/vbac033f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/d0d2c9518982/vbac033f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/019610f81724/vbac033f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/20a26c14559a/vbac033f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/d0d2c9518982/vbac033f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/019610f81724/vbac033f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4df/9710670/20a26c14559a/vbac033f3.jpg

相似文献

1
Sufficient principal component regression for pattern discovery in transcriptomic data.用于转录组数据模式发现的充分主成分回归
Bioinform Adv. 2022 May 14;2(1):vbac033. doi: 10.1093/bioadv/vbac033. eCollection 2022.
2
Discovering a sparse set of pairwise discriminating features in high-dimensional data.在高维数据中发现一组稀疏的成对判别特征。
Bioinformatics. 2021 Apr 19;37(2):202-212. doi: 10.1093/bioinformatics/btaa690.
3
Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.应用稳定性选择方法在高维分子数据中一致估计稀疏主成分。
Bioinformatics. 2015 Aug 15;31(16):2683-90. doi: 10.1093/bioinformatics/btv197. Epub 2015 Apr 10.
4
Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression.使用放大的、最初的边缘特征向量回归从微阵列中预测表型。
Bioinformatics. 2017 Jul 15;33(14):i350-i358. doi: 10.1093/bioinformatics/btx265.
5
Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.基于偏差残差的稀疏偏最小二乘和稀疏核偏最小二乘回归用于删失数据。
Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.
6
Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.高效惩罚广义线性混合模型在高维数据中的变量选择和遗传风险预测。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.
7
Quantifying heterogeneity of expression data based on principal components.基于主成分对表达数据的异质性进行量化。
Bioinformatics. 2019 Feb 15;35(4):553-559. doi: 10.1093/bioinformatics/bty671.
8
scAWMV: an adaptively weighted multi-view learning framework for the integrative analysis of parallel scRNA-seq and scATAC-seq data.scAWMV:一种自适应加权多视图学习框架,用于平行 scRNA-seq 和 scATAC-seq 数据的综合分析。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac739.
9
Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.利用符号回归识别组学数据中的相互作用,以发现临床生物标志物。
Bioinformatics. 2022 Aug 2;38(15):3749-3758. doi: 10.1093/bioinformatics/btac405.
10
Improving deep learning-based protein distance prediction in CASP14.在蛋白质结构预测关键评估第14轮(CASP14)中改进基于深度学习的蛋白质距离预测
Bioinformatics. 2021 Oct 11;37(19):3190-3196. doi: 10.1093/bioinformatics/btab355.

本文引用的文献

1
Screening of key genes associated with R‑CHOP immunochemotherapy and construction of a prognostic risk model in diffuse large B‑cell lymphoma.筛选与 R-CHOP 免疫化疗相关的关键基因,并构建弥漫性大 B 细胞淋巴瘤的预后风险模型。
Mol Med Rep. 2019 Oct;20(4):3679-3690. doi: 10.3892/mmr.2019.10627. Epub 2019 Aug 29.
2
Predicting Phenotypic Diversity from Molecular and Genetic Data.从分子和遗传数据预测表型多样性。
Genetics. 2019 Sep;213(1):297-311. doi: 10.1534/genetics.119.302463. Epub 2019 Jul 27.
3
Genetic alterations and their clinical implications in DLBCL.
弥漫性大 B 细胞淋巴瘤中的遗传改变及其临床意义。
Nat Rev Clin Oncol. 2019 Oct;16(10):634-652. doi: 10.1038/s41571-019-0225-1.
4
Genes encoding SATB2-interacting proteins in adult cerebral cortex contribute to human cognitive ability.编码 SATB2 相互作用蛋白的基因在成人大脑皮层中对人类认知能力有贡献。
PLoS Genet. 2019 Feb 6;15(2):e1007890. doi: 10.1371/journal.pgen.1007890. eCollection 2019 Feb.
5
SMSSVD: SubMatrix Selection Singular Value Decomposition.SMSSVD:子矩阵选择奇异值分解。
Bioinformatics. 2019 Feb 1;35(3):478-486. doi: 10.1093/bioinformatics/bty566.
6
Hide or defend, the two strategies of lymphoma immune evasion: potential implications for immunotherapy.淋巴瘤免疫逃逸的两种策略:隐藏或防御——对免疫治疗的潜在影响。
Haematologica. 2018 Aug;103(8):1256-1268. doi: 10.3324/haematol.2017.184192. Epub 2018 Jul 13.
7
Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.使用偏最小二乘法(PLS)可提高基于 RNA-Seq 数据的差异表达分析中去除不必要变异性的效果。
Genomics. 2019 Jul;111(4):893-898. doi: 10.1016/j.ygeno.2018.05.018. Epub 2018 May 26.
8
Expression of ribosomal and actin network proteins and immunochemotherapy resistance in diffuse large B cell lymphoma patients.核糖体和肌动蛋白网络蛋白的表达与弥漫性大 B 细胞淋巴瘤患者的免疫化学治疗耐药性。
Br J Haematol. 2018 Jun;181(6):770-781. doi: 10.1111/bjh.15259. Epub 2018 May 16.
9
Edge-group sparse PCA for network-guided high dimensional data analysis.基于边缘群稀疏 PCA 的网络引导高维数据分析。
Bioinformatics. 2018 Oct 15;34(20):3479-3487. doi: 10.1093/bioinformatics/bty362.
10
Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis.通过偏最小二乘法(PLS)回归分析确定孟加拉国农村地区与新生儿大小相关的母婴因素。
PLoS One. 2017 Dec 20;12(12):e0189677. doi: 10.1371/journal.pone.0189677. eCollection 2017.