Phan John H, Yin-Goen Qiqin, Young Andrew N, Wang May D
Department of biomedical engineering at Georgia Tech and Emory University, Atlanta, GA, USA.
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:4162-5. doi: 10.1109/IEMBS.2009.5333937.
Advances in high-throughput genomic and proteomic technology have led to a growing interest in cancer biomarkers. These biomarkers can potentially improve the accuracy of cancer subtype prediction and subsequently, the success of therapy. In this paper, we describe emerging technology for enabling translational bioinformatics by improving biomarker identification. Specifically, we present an application that uses prior knowledge to identify the most biologically relevant gene ranking algorithm. Identification of statistically and biologically relevant biomarkers from high-throughput data can be unreliable due to the nature of the data--e.g., high technical variability, small sample size, and high dimension size. Furthermore, due to the lack of available training samples, data-driven machine learning methods are often insufficient without the support of knowledge-based algorithms. As a case study, we apply these knowledge-driven methods to renal cancer data and identify genes that are potential biomarkers for cancer subtype classification.
高通量基因组学和蛋白质组学技术的进步引发了人们对癌症生物标志物日益浓厚的兴趣。这些生物标志物有可能提高癌症亚型预测的准确性,进而提高治疗的成功率。在本文中,我们描述了通过改进生物标志物识别来实现转化生物信息学的新兴技术。具体而言,我们展示了一个利用先验知识来识别最具生物学相关性的基因排序算法的应用程序。由于数据的性质,如高技术变异性、小样本量和高维度大小,从高通量数据中识别具有统计学和生物学相关性的生物标志物可能并不可靠。此外,由于缺乏可用的训练样本,数据驱动的机器学习方法在没有基于知识的算法支持的情况下往往是不够的。作为一个案例研究,我们将这些知识驱动的方法应用于肾癌数据,并识别出作为癌症亚型分类潜在生物标志物的基因。