Perscheid Cindy, Grasnick Bastien, Uflacker Matthias
Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.
J Integr Bioinform. 2018 Dec 22;16(1):20180064. doi: 10.1515/jib-2018-0064.
The advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results.
高通量RNA测序技术的进步使研究人员能够分析特定细胞中的完整基因活性。从这些分析的见解中,研究人员可以识别疾病特异性表达谱,从而了解像癌症这样的复杂疾病,并最终开发出有效的诊断和治疗措施。基因表达数据的高维度对其计算分析提出了挑战,这可通过基因选择措施来解决。传统的基因选择方法基于对实际表达水平的统计分析来得出结果,这在准确识别潜在生物学过程方面存在若干缺点。反过来,综合方法在基因选择过程中纳入了来自外部知识库的有关生物学过程的精心策划的信息,这有望带来更好的可解释性和改进的预测性能。我们的工作比较了传统和综合基因选择方法的性能。此外,我们提出了一种将外部知识与传统基因选择方法相结合的直接方法。我们引入了一个框架,能够实现自动外部知识整合、基因选择和评估。评估结果证明我们的框架是一个有用的评估工具,并表明外部知识的整合改善了整体分析结果。