Suppr超能文献

通过基因表达排名使用组合数据集提高分类中的预测准确性。

Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions.

作者信息

Kim Ki-Yeol, Ki Dong Hyuk, Jeung Hei-Cheul, Chung Hyun Cheol, Rha Sun Young

机构信息

Oral Cancer Research Institute, Yonsei University College of Dentistry, Seoul, 120-752, South Korea.

出版信息

BMC Bioinformatics. 2008 Jun 16;9:283. doi: 10.1186/1471-2105-9-283.

Abstract

BACKGROUND

The information from different data sets experimented under different conditions may be inconsistent even though they are performed with the same research objectives. More than that, even when the data sets were generated from the same platform, the data agreement may be affected by the technical variation among the laboratories. In this case, it is necessary to use the combined data set after adjusting the differences between such data sets, for detecting the more reliable information.

RESULTS

The proposed method combines data sets posterior to the discretization of data sets based on the ranks of the gene expression ratios, and the statistical method is applied to the combined data set for predictive gene selection. The efficiency of the proposed method was evaluated using five colon cancer related data sets, which were experimented using cDNA microarrays with different RNA sources, and one experiment utilized oligonucleotide arrays. NCI-60 cell lines data sets were used, which were performed with two different platforms of cDNA microarrays and Affymetrix HU6800 oligonucleotide arrays. The combined data set by the proposed method predicted the test data sets more accurately than the separated data sets did. The biological significant genes were detected from the combined data set, which were missed on the separated data sets.

CONCLUSION

By transforming gene expressions using ranks, the proposed method is not influenced by systematic bias among chips and normalization method. The method may be especially more useful to find predictive genes from data sets which have different scale in gene expressions.

摘要

背景

即使在相同的研究目标下,在不同条件下进行实验得到的不同数据集的信息可能不一致。不仅如此,即使数据集是由同一平台生成的,数据一致性也可能受到各实验室技术差异的影响。在这种情况下,有必要在调整这些数据集之间的差异后使用合并后的数据集,以检测更可靠的信息。

结果

所提出的方法在基于基因表达率的秩对数据集进行离散化之后合并数据集,并将统计方法应用于合并后的数据集以进行预测基因选择。使用五个与结肠癌相关的数据集对所提出方法的效率进行了评估,这些数据集是使用来自不同RNA来源的cDNA微阵列进行实验得到的,且有一个实验使用了寡核苷酸阵列。使用了NCI - 60细胞系数据集,其通过cDNA微阵列和Affymetrix HU6800寡核苷酸阵列这两种不同平台进行实验。所提出方法得到的合并数据集比单独的数据集更准确地预测了测试数据集。从合并数据集中检测到了在单独数据集中遗漏的具有生物学意义的基因。

结论

通过使用秩来转换基因表达,所提出的方法不受芯片间系统偏差和归一化方法的影响。该方法对于从基因表达具有不同规模的数据集中寻找预测基因可能特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb01/2442106/56d8bf0a13ba/1471-2105-9-283-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验