通过基因表达排名使用组合数据集提高分类中的预测准确性。

Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions.

作者信息

Kim Ki-Yeol, Ki Dong Hyuk, Jeung Hei-Cheul, Chung Hyun Cheol, Rha Sun Young

机构信息

Oral Cancer Research Institute, Yonsei University College of Dentistry, Seoul, 120-752, South Korea.

出版信息

BMC Bioinformatics. 2008 Jun 16;9:283. doi: 10.1186/1471-2105-9-283.

DOI:10.1186/1471-2105-9-283

PMID:18554423

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2442106/

Abstract

BACKGROUND

The information from different data sets experimented under different conditions may be inconsistent even though they are performed with the same research objectives. More than that, even when the data sets were generated from the same platform, the data agreement may be affected by the technical variation among the laboratories. In this case, it is necessary to use the combined data set after adjusting the differences between such data sets, for detecting the more reliable information.

RESULTS

The proposed method combines data sets posterior to the discretization of data sets based on the ranks of the gene expression ratios, and the statistical method is applied to the combined data set for predictive gene selection. The efficiency of the proposed method was evaluated using five colon cancer related data sets, which were experimented using cDNA microarrays with different RNA sources, and one experiment utilized oligonucleotide arrays. NCI-60 cell lines data sets were used, which were performed with two different platforms of cDNA microarrays and Affymetrix HU6800 oligonucleotide arrays. The combined data set by the proposed method predicted the test data sets more accurately than the separated data sets did. The biological significant genes were detected from the combined data set, which were missed on the separated data sets.

CONCLUSION

By transforming gene expressions using ranks, the proposed method is not influenced by systematic bias among chips and normalization method. The method may be especially more useful to find predictive genes from data sets which have different scale in gene expressions.

摘要

背景

即使在相同的研究目标下，在不同条件下进行实验得到的不同数据集的信息可能不一致。不仅如此，即使数据集是由同一平台生成的，数据一致性也可能受到各实验室技术差异的影响。在这种情况下，有必要在调整这些数据集之间的差异后使用合并后的数据集，以检测更可靠的信息。

结果

所提出的方法在基于基因表达率的秩对数据集进行离散化之后合并数据集，并将统计方法应用于合并后的数据集以进行预测基因选择。使用五个与结肠癌相关的数据集对所提出方法的效率进行了评估，这些数据集是使用来自不同RNA来源的cDNA微阵列进行实验得到的，且有一个实验使用了寡核苷酸阵列。使用了NCI - 60细胞系数据集，其通过cDNA微阵列和Affymetrix HU6800寡核苷酸阵列这两种不同平台进行实验。所提出方法得到的合并数据集比单独的数据集更准确地预测了测试数据集。从合并数据集中检测到了在单独数据集中遗漏的具有生物学意义的基因。

结论

通过使用秩来转换基因表达，所提出的方法不受芯片间系统偏差和归一化方法的影响。该方法对于从基因表达具有不同规模的数据集中寻找预测基因可能特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb01/2442106/56d8bf0a13ba/1471-2105-9-283-1.jpg

相似文献

Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions.

BMC Bioinformatics. 2008 Jun 16;9:283. doi: 10.1186/1471-2105-9-283.

Methods for evaluating gene expression from Affymetrix microarray datasets.

BMC Bioinformatics. 2008 Jun 17;9:284. doi: 10.1186/1471-2105-9-284.

Novel and simple transformation algorithm for combining microarray data sets.

BMC Bioinformatics. 2007 Jun 25;8:218. doi: 10.1186/1471-2105-8-218.

Improving gene set analysis of microarray data by SAM-GS.

BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.

Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements.

BMC Bioinformatics. 2005 Apr 25;6:107. doi: 10.1186/1471-2105-6-107.

A meta-data based method for DNA microarray imputation.

BMC Bioinformatics. 2007 Mar 29;8:109. doi: 10.1186/1471-2105-8-109.

Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes.

BMC Bioinformatics. 2007 Nov 2;8:427. doi: 10.1186/1471-2105-8-427.

A new method for class prediction based on signed-rank algorithms applied to Affymetrix microarray experiments.

BMC Bioinformatics. 2008 Jan 11;9:16. doi: 10.1186/1471-2105-9-16.

Orthogonal projections to latent structures as a strategy for microarray data normalization.

BMC Bioinformatics. 2007 Jun 18;8:207. doi: 10.1186/1471-2105-8-207.

Normalization for Affymetrix GeneChips.

Methods Inf Med. 2005;44(3):414-7.

引用本文的文献

Development of novel predictive miRNA/target gene pathways for colorectal cancer distance metastasis to the liver using a bioinformatic approach.

PLoS One. 2019 Feb 26;14(2):e0211968. doi: 10.1371/journal.pone.0211968. eCollection 2019.

Apontic directly activates hedgehog and cyclin E for proper organ growth and patterning.

Sci Rep. 2017 Sep 29;7(1):12470. doi: 10.1038/s41598-017-12766-w.

Evolutionarily conserved transcription factor Apontic controls the G1/S progression by inducing cyclin E during eye development.

Proc Natl Acad Sci U S A. 2014 Jul 1;111(26):9497-502. doi: 10.1073/pnas.1407145111. Epub 2014 Jun 16.

Possibility of the use of public microarray database for identifying significant genes associated with oral squamous cell carcinoma.

Genomics Inform. 2012 Mar;10(1):23-32. doi: 10.5808/GI.2012.10.1.23. Epub 2012 Mar 31.

A method for detecting significant genomic regions associated with oral squamous cell carcinoma using aCGH.

Med Biol Eng Comput. 2010 May;48(5):459-68. doi: 10.1007/s11517-010-0595-0. Epub 2010 Mar 20.

Conserved expression patterns predict microRNA targets.

PLoS Comput Biol. 2009 Sep;5(9):e1000513. doi: 10.1371/journal.pcbi.1000513. Epub 2009 Sep 25.

本文引用的文献

Novel and simple transformation algorithm for combining microarray data sets.

BMC Bioinformatics. 2007 Jun 25;8:218. doi: 10.1186/1471-2105-8-218.

Analysis of the relationship between sex and chromosomal aberrations in colorectal cancer by comparative genomic hybridization.

J Int Med Res. 2006 Jul-Aug;34(4):397-405. doi: 10.1177/147323000603400409.

Combining multiple microarrays in the presence of controlling variables.

Bioinformatics. 2006 Jul 15;22(14):1682-9. doi: 10.1093/bioinformatics/btl183. Epub 2006 May 16.

BMI1 is a target gene of E2F-1 and is strongly expressed in primary neuroblastomas.

Nucleic Acids Res. 2006 Mar 31;34(6):1745-54. doi: 10.1093/nar/gkl119. Print 2006.

Xq25 and Xq26 identify the common minimal deletion region in malignant gastroenteropancreatic endocrine carcinomas.

Virchows Arch. 2006 Feb;448(2):119-26. doi: 10.1007/s00428-005-0058-4. Epub 2005 Oct 22.

A study of inter-lab and inter-platform agreement of DNA microarray data.

BMC Genomics. 2005 May 11;6:71. doi: 10.1186/1471-2164-6-71.

Comparison of gene expression measurements from cDNA and 60-mer oligonucleotide microarrays.

Genomics. 2005 Jun;85(6):657-65. doi: 10.1016/j.ygeno.2005.02.012.

Application of comparative functional genomics to identify best-fit mouse models to study human cancer.

Nat Genet. 2004 Dec;36(12):1306-11. doi: 10.1038/ng1481. Epub 2004 Nov 21.

Reuse of imputed data in microarray analysis increases imputation efficiency.

BMC Bioinformatics. 2004 Oct 26;5:160. doi: 10.1186/1471-2105-5-160.

Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer.

Bioinformatics. 2005 Feb 15;21(4):517-28. doi: 10.1093/bioinformatics/bti029. Epub 2004 Sep 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过基因表达排名使用组合数据集提高分类中的预测准确性。

Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献