Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IMBCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), Campus Miguel de Unamuno s/n, Salamanca, Spain.
Celgene Institute for Translational Research Europe (CITRE), Parque Científico y Tecnológico Cartuja 93, Sevilla, Spain.
Bioinformatics. 2019 Oct 1;35(19):3651-3662. doi: 10.1093/bioinformatics/btz148.
Patient and sample diversity is one of the main challenges when dealing with clinical cohorts in biomedical genomics studies. During last decade, several methods have been developed to identify biomarkers assigned to specific individuals or subtypes of samples. However, current methods still fail to discover markers in complex scenarios where heterogeneity or hidden phenotypical factors are present. Here, we propose a method to analyze and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation.
DEcomposing heterogeneous Cohorts using Omic data profiling (DECO) is a method to find significant association among biological features (biomarkers) and samples (individuals) analyzing large-scale omic data. The method identifies and categorizes biomarkers of specific phenotypic conditions based on a recurrent differential analysis integrated with a non-symmetrical correspondence analysis. DECO integrates both omic data dispersion and predictor-response relationship from non-symmetrical correspondence analysis in a unique statistic (called h-statistic), allowing the identification of closely related sample categories within complex cohorts. The performance is demonstrated using simulated data and five experimental transcriptomic datasets, and comparing to seven other methods. We show DECO greatly enhances the discovery and subtle identification of biomarkers, making it especially suited for deep and accurate patient stratification.
DECO is freely available as an R package (including a practical vignette) at Bioconductor repository (http://bioconductor.org/packages/deco/).
Supplementary data are available at Bioinformatics online.
在生物医学基因组学研究中处理临床队列时,患者和样本的多样性是主要挑战之一。在过去的十年中,已经开发了几种方法来识别分配给特定个体或样本亚类的生物标志物。然而,当前的方法仍然无法在存在异质性或隐藏表型因素的复杂情况下发现标记物。在这里,我们提出了一种分析和理解异质数据的方法,避免了经典的减少或消除变异的归一化方法。
使用 Omic 数据剖析(DECO)分解异质队列是一种分析大规模 omic 数据中生物特征(生物标志物)和样本(个体)之间显著关联的方法。该方法基于与非对称对应分析集成的递归差异分析来识别和分类特定表型条件的生物标志物。DECO 将 omic 数据分散和非对称对应分析中的预测器-响应关系集成到一个独特的统计量(称为 h 统计量)中,允许在复杂队列中识别密切相关的样本类别。通过使用模拟数据和五个实验转录组数据集以及与其他七种方法进行比较,证明了该方法的性能。我们表明,DECO 极大地增强了生物标志物的发现和细微识别,使其特别适合深度和准确的患者分层。
DECO 可作为 R 包(包括实用案例)免费获得,位于 Bioconductor 存储库(http://bioconductor.org/packages/deco/)。
补充数据可在生物信息学在线获得。