在异质组织样本中进行生物标志物发现——采用计算机去混淆方法。

Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach.

机构信息

Department of Genetics and Biometry, Research Institute for Biology of Farm Animals, Wilhelm-Stahl Allee 2, D 18196 Dummerstorf, Germany.

出版信息

BMC Bioinformatics. 2010 Jan 14;11:27. doi: 10.1186/1471-2105-11-27.

DOI:10.1186/1471-2105-11-27

PMID:20070912

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098067/

Abstract

BACKGROUND

For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.

RESULTS

Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach.Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.

CONCLUSIONS

The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.

摘要

背景

对于异质组织，如血液，基因表达的测量受到涉及的细胞类型相对比例的影响。结论必须依赖于对同质细胞群体的基因表达信号的估计，例如通过应用显微切割、荧光激活细胞分选或计算机去混淆。我们使用来自同一供体样本的血液和分选细胞的实验基因表达数据研究了非负矩阵分解算法的可行性和有效性。我们的目的是优化该算法，以检测差异表达基因，并使其能够用于在反向调节基因的困难情况下进行分类。这对于在异质组织中识别候选生物标志物非常重要。

结果

实验数据和涉及从这些数据估计的噪声参数的模拟研究表明，对于有效检测差异基因表达，分位数归一化和使用非对数数据是最优的。我们证明了从单个样本的基因表达数据预测构成细胞类型比例的可行性，这是去混淆分类方法的前提。报告了有无去混淆结果的分类交叉验证错误以及样本大小依赖性。该算法的实现、模拟和分析脚本都可用。

结论

提出了一种使用分位数归一化和非对数数据的非相关去混淆算法，用于难以检测的生物标志物，以及怀疑细胞类型比例变化是混杂因素的情况。在这种情况下，去混淆排序方法可以作为其他统计学习方法的替代方法或补充方法，用于在合理噪声条件下和中等样本量下定义生物医学分子诊断和预测的候选生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45ed/3098067/296ae8c151bd/1471-2105-11-27-1.jpg

相似文献

Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach.

BMC Bioinformatics. 2010 Jan 14;11:27. doi: 10.1186/1471-2105-11-27.

In silico microdissection of microarray data from heterogeneous cell populations.

BMC Bioinformatics. 2005 Mar 14;6:54. doi: 10.1186/1471-2105-6-54.

A non-transformation method for identifying differentially expressed genes from cDNA microarrays.

Yi Chuan Xue Bao. 2006 Jan;33(1):80-8. doi: 10.1016/S0379-4172(06)60012-7.

Emerging translational bioinformatics: knowledge-guided biomarker identification for cancer diagnostics.

Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:4162-5. doi: 10.1109/IEMBS.2009.5333937.

Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study.

Infect Genet Evol. 2012 Jul;12(5):913-21. doi: 10.1016/j.meegid.2011.08.014. Epub 2011 Sep 10.

Probabilistic analysis of gene expression measurements from heterogeneous tissues.

Bioinformatics. 2010 Oct 15;26(20):2571-7. doi: 10.1093/bioinformatics/btq406. Epub 2010 Jul 14.

Use of normalization methods for analysis of microarrays containing a high degree of gene effects.

BMC Bioinformatics. 2008 Nov 28;9:505. doi: 10.1186/1471-2105-9-505.

Biomarker discovery based on BBHA and AdaboostM1 on microarray data for cancer classification.

Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:3080-3083. doi: 10.1109/EMBC.2016.7591380.

Deconfounding microarray analysis - independent measurements of cell type proportions used in a regression model to resolve tissue heterogeneity bias.

Methods Inf Med. 2006;45(5):557-63.

AUCTSP: an improved biomarker gene pair class predictor.

BMC Bioinformatics. 2018 Jun 26;19(1):244. doi: 10.1186/s12859-018-2231-1.

引用本文的文献

Approaching the holistic transcriptome-convolution and deconvolution in transcriptomics.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf388.

Reference-free deconvolution of complex samples based on cross-cell-type differential analysis: Systematic evaluations with various feature selection options.

Front Genet. 2025 May 30;16:1570781. doi: 10.3389/fgene.2025.1570781. eCollection 2025.

An augmented GSNMF model for complete deconvolution of bulk RNA-seq data.

Math Biosci Eng. 2025 Mar 14;22(4):988-1018. doi: 10.3934/mbe.2025036.

STsisal: a reference-free deconvolution pipeline for spatial transcriptomics data.

Front Genet. 2025 Mar 3;16:1512435. doi: 10.3389/fgene.2025.1512435. eCollection 2025.

Alleviating batch effects in cell type deconvolution with SCCAF-D.

Nat Commun. 2024 Dec 30;15(1):10867. doi: 10.1038/s41467-024-55213-x.

Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data.

BMC Genomics. 2024 Sep 18;25(1):875. doi: 10.1186/s12864-024-10728-x.

Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data.

BMC Bioinformatics. 2024 Jun 12;25(1):209. doi: 10.1186/s12859-024-05825-3.

CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues.

Bioinform Adv. 2024 Mar 23;4(1):vbae048. doi: 10.1093/bioadv/vbae048. eCollection 2024.

Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges.

Nucleic Acids Res. 2024 May 22;52(9):4761-4783. doi: 10.1093/nar/gkae267.

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.

Found Data Sci. 2022 Sep;4(3):441-466. doi: 10.3934/fods.2022013.

本文引用的文献

A HaemAtlas: characterizing gene expression in differentiated human blood cells.

Blood. 2009 May 7;113(19):e1-9. doi: 10.1182/blood-2008-06-162958. Epub 2009 Feb 19.

Novel strategies to identify biomarkers in tuberculosis.

Biol Chem. 2008 May;389(5):487-95. doi: 10.1515/bc.2008.053.

GlobalANCOVA: exploration and assessment of gene group effects.

Bioinformatics. 2008 Jan 1;24(1):78-85. doi: 10.1093/bioinformatics/btm531. Epub 2007 Nov 17.

A comparison of background correction methods for two-colour microarrays.

Bioinformatics. 2007 Oct 15;23(20):2700-7. doi: 10.1093/bioinformatics/btm412. Epub 2007 Aug 25.

Deconfounding microarray analysis - independent measurements of cell type proportions used in a regression model to resolve tissue heterogeneity bias.

Methods Inf Med. 2006;45(5):557-63.

Classifying gene expression profiles from pairwise mRNA comparisons.

Stat Appl Genet Mol Biol. 2004;3:Article19. doi: 10.2202/1544-6115.1071. Epub 2004 Aug 30.

Sample selection for microarray gene expression studies.

Methods Inf Med. 2005;44(3):461-7.

Automated tissue analysis--a bioinformatics perspective.

Methods Inf Med. 2005;44(1):32-7.

In silico microdissection of microarray data from heterogeneous cell populations.

BMC Bioinformatics. 2005 Mar 14;6:54. doi: 10.1186/1471-2105-6-54.

Efficient two-sample designs for microarray experiments with biological replications.

In Silico Biol. 2004;4(4):461-70.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在异质组织样本中进行生物标志物发现——采用计算机去混淆方法。

Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献