• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种灵活、可解释且准确的方法,用于推断未测量基因的表达。

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.

机构信息

Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.

Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.

DOI:10.1093/nar/gkaa881
PMID:33074331
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7708069/
Abstract

While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96-570 and LINCS), and multiple imputation tasks (within and across microarray/RNA-seq datasets) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data.

摘要

虽然有超过 200 万个人类微阵列基因表达谱可供公开使用,但这些谱是使用各种平台测量的,每种平台都涵盖了预先定义的、有限的基因集。因此,重新分析和整合这个大规模数据集的关键是能够通过推断未测量基因的表达来计算重建部分测量的微阵列样品中的完整转录组的方法。当前最先进的推断方法是针对特定平台的样本量身定制的,并且无论目标样本的生物学背景如何,都依赖于基因-基因关系。我们表明,稀疏回归模型可以捕获样本-样本关系(称为 SampleLASSO),该模型为每个要推断的新目标样本实时构建,优于基于固定基因关系的模型。涉及三种机器学习算法(LASSO、k-最近邻和深度神经网络)、两个基因子集(GPL96-570 和 LINCS)和多个推断任务(在微阵列/RNA-seq 数据集内和跨数据集)的广泛评估表明,SampleLASSO 是最准确的模型。此外,我们通过表明对于从特定组织推断目标样本,SampleLASSO 自动利用来自同一组织的训练样本,证明了这种方法的生物学可解释性。因此,SampleLASSO 是一种简单、强大且灵活的方法,可用于协调大规模基因表达数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/c84655fb3d56/gkaa881fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/efe27d546630/gkaa881fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/a43e5d625aa0/gkaa881fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/8e43a3b723aa/gkaa881fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/a136eeca5b46/gkaa881fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/e9b408799351/gkaa881fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/c84655fb3d56/gkaa881fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/efe27d546630/gkaa881fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/a43e5d625aa0/gkaa881fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/8e43a3b723aa/gkaa881fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/a136eeca5b46/gkaa881fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/e9b408799351/gkaa881fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b6/7708069/c84655fb3d56/gkaa881fig6.jpg

相似文献

1
A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.一种灵活、可解释且准确的方法,用于推断未测量基因的表达。
Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.
2
scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size.scHinter:利用有限的样本量对单细胞 RNA-seq 数据进行缺失事件推断。
Bioinformatics. 2020 Feb 1;36(3):789-797. doi: 10.1093/bioinformatics/btz627.
3
Missing value imputation for gene expression data by tailored nearest neighbors.通过定制最近邻算法对基因表达数据进行缺失值插补
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.
4
Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.用于提高微阵列可比性的RNA测序数据的探针区域表达估计
PLoS One. 2015 May 12;10(5):e0126545. doi: 10.1371/journal.pone.0126545. eCollection 2015.
5
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.基于基因表达谱和功能模块,替换不可靠的cDNA微阵列测量值对疾病分类的影响。
Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.
6
Imputation of spatially-resolved transcriptomes by graph-regularized tensor completion.基于图正则化张量补全的空间分辨转录组推断。
PLoS Comput Biol. 2021 Apr 7;17(4):e1008218. doi: 10.1371/journal.pcbi.1008218. eCollection 2021 Apr.
7
Mining the Archives: A Cross-Platform Analysis of Gene Expression Profiles in Archival Formalin-Fixed Paraffin-Embedded Tissues.挖掘档案:对存档福尔马林固定石蜡包埋组织中基因表达谱的跨平台分析
Toxicol Sci. 2015 Dec;148(2):460-72. doi: 10.1093/toxsci/kfv195. Epub 2015 Sep 10.
8
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
9
Missing value imputation for epistatic MAPs.基于 MAP 的连锁缺失值填补。
BMC Bioinformatics. 2010 Apr 20;11:197. doi: 10.1186/1471-2105-11-197.
10
scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.scNPF:一种基于网络传播和网络融合的综合框架,用于单细胞 RNA-seq 数据的预处理。
BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.

引用本文的文献

1
Transcriptome-wide analyses delineate the genetic architecture of expression variation in atopic dermatitis.全转录组分析描绘了特应性皮炎中表达变异的遗传结构。
HGG Adv. 2025 Apr 10;6(2):100422. doi: 10.1016/j.xhgg.2025.100422. Epub 2025 Feb 26.
2
Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.跨平台归一化可实现微阵列和 RNA-seq 数据上的机器学习模型训练。
Commun Biol. 2023 Feb 25;6(1):222. doi: 10.1038/s42003-023-04588-6.
3
Risk-score model to predict prognosis of malignant airway obstruction after interventional bronchoscopy.

本文引用的文献

1
RNA sequencing: the teenage years.RNA 测序:青少年时期。
Nat Rev Genet. 2019 Nov;20(11):631-656. doi: 10.1038/s41576-019-0150-2. Epub 2019 Jul 24.
2
Conditional generative adversarial network for gene expression inference.条件生成对抗网络用于基因表达推断。
Bioinformatics. 2018 Sep 1;34(17):i603-i611. doi: 10.1093/bioinformatics/bty563.
3
ArrayExpress update - from bulk to single-cell expression data.ArrayExpress 更新——从批量到单细胞表达数据。
预测介入性支气管镜检查后恶性气道阻塞预后的风险评分模型。
Transl Lung Cancer Res. 2021 Jul;10(7):3173-3190. doi: 10.21037/tlcr-21-301.
4
Reconciling multiple connectivity scores for drug repurposing.药物重定位的多种连通性得分的协调。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab161.
Nucleic Acids Res. 2019 Jan 8;47(D1):D711-D715. doi: 10.1093/nar/gky964.
4
Massive mining of publicly available RNA-seq data from human and mouse.大规模挖掘人类和小鼠公共可用的 RNA-seq 数据。
Nat Commun. 2018 Apr 10;9(1):1366. doi: 10.1038/s41467-018-03751-6.
5
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱:L1000平台及首批100万个图谱
Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.
6
Imputing gene expression to maximize platform compatibility.估算基因表达以最大化平台兼容性。
Bioinformatics. 2017 Feb 15;33(4):522-528. doi: 10.1093/bioinformatics/btw664.
7
Gene expression inference with deep learning.基于深度学习的基因表达推断
Bioinformatics. 2016 Jun 15;32(12):1832-9. doi: 10.1093/bioinformatics/btw074. Epub 2016 Feb 11.
8
Leveraging global gene expression patterns to predict expression of unmeasured genes.利用全球基因表达模式预测未测量基因的表达。
BMC Genomics. 2015 Dec 15;16:1065. doi: 10.1186/s12864-015-2250-5.
9
From big data analysis to personalized medicine for all: challenges and opportunities.从大数据分析到全民个性化医疗:挑战与机遇
BMC Med Genomics. 2015 Jun 27;8:33. doi: 10.1186/s12920-015-0108-y.
10
Human genomics. The human transcriptome across tissues and individuals.人类基因组学。跨组织和个体的人类转录组。
Science. 2015 May 8;348(6235):660-5. doi: 10.1126/science.aaa0355.