Nabavi Sheida, Schmolze Daniel, Maitituoheti Mayinuer, Malladi Sadhika, Beck Andrew H
Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Department of Pathology and Cancer Research Institute, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA, USA.
Bioinformatics. 2016 Feb 15;32(4):533-41. doi: 10.1093/bioinformatics/btv634. Epub 2015 Oct 29.
A major goal of biomedical research is to identify molecular features associated with a biological or clinical class of interest. Differential expression analysis has long been used for this purpose; however, conventional methods perform poorly when applied to data with high within class heterogeneity.
To address this challenge, we developed EMDomics, a new method that uses the Earth mover's distance to measure the overall difference between the distributions of a gene's expression in two classes of samples and uses permutations to obtain q-values for each gene. We applied EMDomics to the challenging problem of identifying genes associated with drug resistance in ovarian cancer. We also used simulated data to evaluate the performance of EMDomics, in terms of sensitivity and specificity for identifying differentially expressed gene in classes with high within class heterogeneity. In both the simulated and real biological data, EMDomics outperformed competing approaches for the identification of differentially expressed genes, and EMDomics was significantly more powerful than conventional methods for the identification of drug resistance-associated gene sets. EMDomics represents a new approach for the identification of genes differentially expressed between heterogeneous classes and has utility in a wide range of complex biomedical conditions in which sample classes show within class heterogeneity.
The R package is available at http://www.bioconductor.org/packages/release/bioc/html/EMDomics.html.
生物医学研究的一个主要目标是识别与感兴趣的生物学或临床类别相关的分子特征。差异表达分析长期以来一直用于此目的;然而,传统方法应用于具有高类内异质性的数据时表现不佳。
为应对这一挑战,我们开发了EMDomics,这是一种新方法,它使用推土机距离来测量基因在两类样本中的表达分布之间的总体差异,并使用排列来获得每个基因的q值。我们将EMDomics应用于识别卵巢癌中与耐药性相关基因这一具有挑战性的问题。我们还使用模拟数据来评估EMDomics的性能,包括在具有高类内异质性的类别中识别差异表达基因的敏感性和特异性。在模拟和真实生物学数据中,EMDomics在识别差异表达基因方面优于竞争方法,并且在识别与耐药性相关的基因集方面比传统方法强大得多。EMDomics代表了一种识别异质类别之间差异表达基因的新方法,并且在样本类别显示类内异质性的广泛复杂生物医学条件中具有实用性。
R包可在http://www.bioconductor.org/packages/release/bioc/html/EMDomics.html获取。