用于合并微阵列数据集的新颖且简单的转换算法。

Novel and simple transformation algorithm for combining microarray data sets.

作者信息

Kim Ki-Yeol, Ki Dong Hyuk, Jeong Ha Jin, Jeung Hei-Cheul, Chung Hyun Cheol, Rha Sun Young

机构信息

Oral Cancer Research Institute, Yonsei University College of Dentistry, Seoul, Korea.

出版信息

BMC Bioinformatics. 2007 Jun 25;8:218. doi: 10.1186/1471-2105-8-218.

DOI:10.1186/1471-2105-8-218

PMID:17588268

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1914088/

Abstract

BACKGROUND

With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis.

RESULTS

Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set.

CONCLUSION

Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.

摘要

背景

利用微阵列技术，诸如RNA来源、微阵列生产或不同平台的使用等实验环境中的变异性可能会导致偏差。这种系统差异对微阵列数据分析构成了重大障碍，导致信息不一致且不可靠。因此，微阵列技术领域最紧迫的挑战之一是如何在进行特定分析之前整合来自不同微阵列实验的结果或合并数据集。

结果

使用了基于17k cDNA微阵列系统的两个微阵列数据集，包括82个正常结肠黏膜组织和72个结肠直肠癌组织。每个数据集分别由总RNA或扩增的mRNA制备，通过方差分析（ANOVA）模型检测这两个数据集之间RNA来源的差异。引入了一种基于不同微阵列数据集之间基因表达比率分布的简单整合方法。该方法逐基因地将基因表达比率转换为参考数据集的形式。层次聚类分析、密度图和箱线图以及具有相关系数的混合得分表明，这两个数据集很好地混合在一起，表明所提出的方法将实验偏差最小化。此外，所提出的转换方法未检测到任何RNA来源效应。在混合数据集中，先前确定的正常和肿瘤两个亚组得到了很好的分离，并且整合效率在肿瘤组中比正常组更为显著。当将同一实验组中具有强同质性的数据集用作参考数据集时，转换方法的效果略好。

结论

所提出的方法简单但对于合并来自不同实验条件的多个数据集很有用。通过这种方法，通过对样本量增加的合并数据集应用各种分析方法，可以检测到生物学上有用的信息。

相似文献

Novel and simple transformation algorithm for combining microarray data sets.

BMC Bioinformatics. 2007 Jun 25;8:218. doi: 10.1186/1471-2105-8-218.

The latent process decomposition of cDNA microarray data sets.

IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):143-56. doi: 10.1109/TCBB.2005.29.

An attempt for combining microarray data sets by adjusting gene expressions.

Cancer Res Treat. 2007 Jun;39(2):74-81. doi: 10.4143/crt.2007.39.2.74. Epub 2007 Jun 30.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Integrative missing value estimation for microarray data.

BMC Bioinformatics. 2006 Oct 12;7:449. doi: 10.1186/1471-2105-7-449.

EMMA: a platform for consistent storage and efficient analysis of microarray data.

J Biotechnol. 2003 Dec 19;106(2-3):135-46. doi: 10.1016/j.jbiotec.2003.08.010.

Gene Vector Analysis (Geneva): a unified method to detect differentially-regulated gene sets and similar microarray experiments.

BMC Bioinformatics. 2008 Aug 22;9:348. doi: 10.1186/1471-2105-9-348.

lumi: a pipeline for processing Illumina microarray.

Bioinformatics. 2008 Jul 1;24(13):1547-8. doi: 10.1093/bioinformatics/btn224. Epub 2008 May 8.

SEGS: search for enriched gene sets in microarray data.

J Biomed Inform. 2008 Aug;41(4):588-601. doi: 10.1016/j.jbi.2007.12.001. Epub 2007 Dec 15.

Combining multiple microarray studies using bootstrap meta-analysis.

Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:5660-3. doi: 10.1109/IEMBS.2008.4650498.

引用本文的文献

Pathway-Based Analysis of the Liver Response to Intravenous Methylprednisolone Administration in Rats: Acute Versus Chronic Dosing.

Gene Regul Syst Bio. 2019 Apr 15;13:1177625019840282. doi: 10.1177/1177625019840282. eCollection 2019.

Development of novel predictive miRNA/target gene pathways for colorectal cancer distance metastasis to the liver using a bioinformatic approach.

PLoS One. 2019 Feb 26;14(2):e0211968. doi: 10.1371/journal.pone.0211968. eCollection 2019.

Effect of data combination on predictive modeling: a study using gene expression data.

AMIA Annu Symp Proc. 2010 Nov 13;2010:567-71.

Comparative analysis of acute and chronic corticosteroid pharmacogenomic effects in rat liver: transcriptional dynamics and regulatory structures.

BMC Bioinformatics. 2010 Oct 14;11:515. doi: 10.1186/1471-2105-11-515.

MAID : an effect size based model for microarray data integration across laboratories and platforms.

BMC Bioinformatics. 2008 Jul 10;9:305. doi: 10.1186/1471-2105-9-305.

Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions.

BMC Bioinformatics. 2008 Jun 16;9:283. doi: 10.1186/1471-2105-9-283.

本文引用的文献

The role of MYH and microsatellite instability in the development of sporadic colorectal cancer.

Br J Cancer. 2006 Nov 6;95(9):1239-43. doi: 10.1038/sj.bjc.6603421. Epub 2006 Oct 10.

Evidence for a colorectal cancer susceptibility locus on chromosome 3q21-q24 from a high-density SNP genome-wide linkage scan.

Hum Mol Genet. 2006 Oct 1;15(19):2903-10. doi: 10.1093/hmg/ddl231. Epub 2006 Aug 21.

Frequent occurrence of uniparental disomy in colorectal cancer.

Carcinogenesis. 2007 Jan;28(1):38-48. doi: 10.1093/carcin/bgl086. Epub 2006 Jun 13.

Combining multiple microarrays in the presence of controlling variables.

Bioinformatics. 2006 Jul 15;22(14):1682-9. doi: 10.1093/bioinformatics/btl183. Epub 2006 May 16.

Determination of genes related to gastrointestinal tract origin cancer cells using a cDNA microarray.

Clin Cancer Res. 2005 Jan 1;11(1):79-86.

Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes.

BMC Bioinformatics. 2004 Jun 24;5:81. doi: 10.1186/1471-2105-5-81.

Integrative analysis of multiple gene expression profiles applied to liver cancer study.

FEBS Lett. 2004 May 7;565(1-3):93-100. doi: 10.1016/j.febslet.2004.03.081.

Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments.

BMC Genomics. 2004 Apr 30;5(1):29. doi: 10.1186/1471-2164-5-29.

Adjustment of systematic microarray data biases.

Bioinformatics. 2004 Jan 1;20(1):105-14. doi: 10.1093/bioinformatics/btg385.

Combining multiple microarray studies and modeling interstudy variation.

Bioinformatics. 2003;19 Suppl 1:i84-90. doi: 10.1093/bioinformatics/btg1010.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于合并微阵列数据集的新颖且简单的转换算法。

Novel and simple transformation algorithm for combining microarray data sets.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献