School of Interdisciplinary Informatics, University of Nebraska at Omaha, 110 S 67th St, Omaha, 68182, NE, USA.
BMC Bioinformatics. 2019 Nov 21;20(1):601. doi: 10.1186/s12859-019-3140-7.
High-throughput gene expression profiles have allowed discovery of potential biomarkers enabling early diagnosis, prognosis and developing individualized treatment. However, it remains a challenge to identify a set of reliable and reproducible biomarkers across various gene expression platforms and laboratories for single sample diagnosis and prognosis. We address this need with our Data-Driven Reference (DDR) approach, which employs stably expressed housekeeping genes as references to eliminate platform-specific biases and non-biological variabilities.
Our method identifies biomarkers with "built-in" features, and these features can be interpreted consistently regardless of profiling technology, which enable classification of single-sample independent of platforms. Validation with RNA-seq data of blood platelets shows that DDR achieves the superior performance in classification of six different tumor types as well as molecular target statuses (such as MET or HER2-positive, and mutant KRAS, EGFR or PIK3CA) with smaller sets of biomarkers. We demonstrate on the three microarray datasets that our method is capable of identifying robust biomarkers for subgrouping medulloblastoma samples with data perturbation due to different microarray platforms. In addition to identifying the majority of subgroup-specific biomarkers in CodeSet of nanoString, some potential new biomarkers for subgrouping medulloblastoma were detected by our method.
In this study, we present a simple, yet powerful data-driven method which contributes significantly to identification of robust cross-platform gene signature for disease classification of single-patient to facilitate precision medicine. In addition, our method provides a new strategy for transcriptome analysis.
高通量基因表达谱的出现使我们能够发现潜在的生物标志物,从而实现早期诊断、预后判断,并制定个体化的治疗方案。然而,要在不同的基因表达平台和实验室中识别出一组可靠且可重复的生物标志物,用于单个样本的诊断和预后,仍然是一个挑战。我们采用数据驱动参考(DDR)方法来解决这一需求,该方法使用稳定表达的管家基因作为参考,以消除平台特异性偏差和非生物学变异性。
我们的方法确定了具有“内置”特征的生物标志物,这些特征可以根据不同的基因表达技术进行一致的解释,从而实现了对单个样本的分类,而无需考虑平台。使用血小板 RNA-seq 数据进行验证表明,DDR 在对六种不同肿瘤类型和分子靶标状态(如 MET 或 HER2 阳性、突变 KRAS、EGFR 或 PIK3CA)的分类中具有卓越的性能,并且所需的生物标志物数量更少。我们在三个微阵列数据集上证明了我们的方法能够识别稳健的生物标志物,用于对因不同微阵列平台而产生数据扰动的髓母细胞瘤样本进行亚组分类。除了在 nanoString 的 CodeSet 中识别出大多数亚组特异性生物标志物外,我们的方法还检测到了一些潜在的髓母细胞瘤亚组分类的新生物标志物。
在这项研究中,我们提出了一种简单而强大的数据驱动方法,该方法为识别稳健的跨平台基因特征做出了重要贡献,有助于实现针对单个患者的疾病分类,以促进精准医学。此外,我们的方法为转录组分析提供了一种新策略。