MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China.
Nucleic Acids Res. 2013 Aug;41(14):e143. doi: 10.1093/nar/gkt343. Epub 2013 Jun 12.
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define 'correlation feature space' for samples based on the gene expression profiles by iterative employment of Pearson's correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.
基因表达谱分析逐渐成为疾病诊断和分类的常规程序。在过去的十年中,已经提出了许多计算方法,在特征选择以及分类和聚类的算法等各个层面都取得了很大的改进。在本研究中,我们提出了 iPcc,这是一种从特征提取角度出发的新方法,旨在进一步推动基因表达谱分析技术从实验室走向临床。我们基于基因表达谱通过迭代使用皮尔逊相关系数为样本定义了“相关特征空间”。基于模拟和真实基因表达数据集的数值实验表明,iPcc 可以极大地突出嘈杂基因表达数据背后的潜在模式,从而极大地提高当前基于基因表达谱进行疾病诊断和分类的算法的稳健性和准确性。