Nuzzo Angelo, Carapezza Giovanni, Di Bella Sebastiano, Pulvirenti Alfredo, Isacchi Antonella, Bosotti Roberta
Business Unit Oncology, Nerviano Medical Sciences srl, Nerviano, MI, 20014, Italy.
Department of Bioengineering, University of Applied Sciences, Vienna, 1190, Austria.
BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):340. doi: 10.1186/s12859-016-1188-1.
Kinase over-expression and activation as a consequence of gene amplification or gene fusion events is a well-known mechanism of tumorigenesis. The search for novel rearrangements of kinases or other druggable genes may contribute to understanding the biology of cancerogenesis, as well as lead to the identification of new candidate targets for drug discovery. However this requires the ability to query large datasets to identify rare events occurring in very small fractions (1-3 %) of different tumor subtypes. This task is different from what is normally done by conventional tools that are able to find genes differentially expressed between two experimental conditions.
We propose a computational method aimed at the automatic identification of genes which are selectively over-expressed in a very small fraction of samples within a specific tissue. The method does not require a healthy counterpart or a reference sample for the analysis and can be therefore applied also to transcriptional data generated from cell lines. In our implementation the tool can use gene-expression data from microarray experiments, as well as data generated by RNASeq technologies.
The method was implemented as a publicly available, user-friendly tool called KAOS (Kinase Automatic Outliers Search). The tool enables the automatic execution of iterative searches for the identification of extreme outliers and for the graphical visualization of the results. Filters can be applied to select the most significant outliers. The performance of the tool was evaluated using a synthetic dataset and compared to state-of-the-art tools. KAOS performs particularly well in detecting genes that are overexpressed in few samples or when an extreme outlier stands out on a high variable expression background. To validate the method on real case studies, we used publicly available tumor cell line microarray data, and we were able to identify genes which are known to be overexpressed in specific samples, as well as novel ones.
由于基因扩增或基因融合事件导致的激酶过表达和激活是一种众所周知的肿瘤发生机制。寻找激酶或其他可药物化基因的新型重排可能有助于理解癌症发生的生物学过程,并有助于识别药物发现的新候选靶点。然而,这需要有能力查询大型数据集,以识别在不同肿瘤亚型的极小部分(1-3%)中发生的罕见事件。这项任务不同于传统工具通常所做的工作,传统工具能够找到在两种实验条件下差异表达的基因。
我们提出了一种计算方法,旨在自动识别在特定组织内极小部分样本中选择性过表达的基因。该方法在分析时不需要健康对照或参考样本,因此也可应用于从细胞系生成的转录数据。在我们的实现中,该工具可以使用来自微阵列实验的基因表达数据,以及由RNA测序技术生成的数据。
该方法被实现为一个名为KAOS(激酶自动离群值搜索)的公开可用、用户友好的工具。该工具能够自动执行迭代搜索,以识别极端离群值并对结果进行图形化可视化。可以应用过滤器来选择最显著的离群值。使用合成数据集对该工具的性能进行了评估,并与现有最先进的工具进行了比较。KAOS在检测少数样本中过表达的基因或在高可变表达背景下突出的极端离群值时表现特别出色。为了在实际案例研究中验证该方法,我们使用了公开可用的肿瘤细胞系微阵列数据,并且能够识别已知在特定样本中过表达的基因以及新的基因。