Shabalin Andrey A, Tjelmeland Håkon, Fan Cheng, Perou Charles M, Nobel Andrew B
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, NC, USA.
Bioinformatics. 2008 May 1;24(9):1154-60. doi: 10.1093/bioinformatics/btn083. Epub 2008 Mar 5.
Gene-expression microarrays are currently being applied in a variety of biomedical applications. This article considers the problem of how to merge datasets arising from different gene-expression studies of a common organism and phenotype. Of particular interest is how to merge data from different technological platforms.
The article makes two contributions to the problem. The first is a simple cross-study normalization method, which is based on linked gene/sample clustering of the given datasets. The second is the introduction and description of several general validation measures that can be used to assess and compare cross-study normalization methods. The proposed normalization method is applied to three existing breast cancer datasets, and is compared to several competing normalization methods using the proposed validation measures.
The supplementary materials and XPN Matlab code are publicly available at website: https://genome.unc.edu/xpn
基因表达微阵列目前正应用于各种生物医学应用中。本文考虑了如何合并来自同一生物体和表型的不同基因表达研究产生的数据集这一问题。特别令人感兴趣的是如何合并来自不同技术平台的数据。
本文对该问题有两个贡献。第一个是一种简单的跨研究归一化方法,它基于给定数据集的关联基因/样本聚类。第二个是引入并描述了几种通用的验证措施,可用于评估和比较跨研究归一化方法。所提出的归一化方法应用于三个现有的乳腺癌数据集,并使用所提出的验证措施与几种竞争的归一化方法进行比较。
补充材料和XPN Matlab代码可在网站https://genome.unc.edu/xpn上公开获取。