Luo Yuan, Riedlinger Gregory, Szolovits Peter
Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
Department of Pathology, Massachusetts General Hospital, Boston, MA, USA.
Cancer Inform. 2014 Oct 13;13(Suppl 1):69-79. doi: 10.4137/CIN.S13874. eCollection 2014.
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
癌症相关基因的优先级排序作为一种通过计算分析来降低湿实验室成本的有效方法,受到了越来越多的关注。这种计算分析根据实验验证成功的可能性对候选基因进行排名。众多基因优先级排序工具已经开发出来,每个工具都整合了不同的数据源,包括基因序列、差异表达、功能注释、基因调控、蛋白质结构域、蛋白质相互作用和通路。本综述将现有的基因优先级排序工具置于对癌症的综合组学层次结构观点的背景下,并重点分析其文本挖掘组件。我们解释了文本挖掘在基因优先级排序中进展相对缓慢的原因,识别了当前文本挖掘方法面临的几个挑战,并强调了几个方向,在这些方向上,更有效的文本挖掘算法可能会改善整体优先级排序任务,并且在这些方向上,对通路进行优先级排序可能比仅对基因进行优先级排序更可取。