Pongor Lőrinc, Kormos Máté, Hatzis Christos, Pusztai Lajos, Szabó András, Győrffy Balázs
MTA TTK Lendület Cancer Biomarker Research Group, Research Centre for Natural Sciences, Magyar tudósok körútja 2, Budapest, H-1117, Hungary.
2nd Department of Pediatrics, Semmelweis University, Budapest, Hungary.
Genome Med. 2015 Oct 16;7:104. doi: 10.1186/s13073-015-0228-1.
The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients.
The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a 'transcriptomic fingerprint' by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures.
According to this approach, the top driver oncogenes having a mutation prevalence over 5 % included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r(2) = 0.73, P < 1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 % confirmed analysis reproducibility.
We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes ( http://www.g-2-o.com ).
利用体细胞突变预测临床结果存在困难,原因在于一个突变可能间接影响许多基因的功能,还因为在相对年轻的下一代测序(NGS)数据库中临床随访数据稀少。在此,我们通过将序列数据库与注释完善的基因芯片数据集相链接来解决这一问题,使用多基因转录组指纹作为乳腺癌患者基因突变与基因表达之间的联系。
该数据库由763个NGS样本组成,包含22938个基因的突变状态以及10987个基因的RNA测序数据。基因芯片数据库包含5934名患者的10987个基因及临床特征。为进行预测,首先通过对突变和RNA测序数据进行ROC分析,将样本中存在的突变转化为“转录组指纹”。然后通过计算上调和下调特征的Cox回归来评估与生存的相关性。
根据此方法,突变发生率超过5%的主要驱动癌基因包括AKT1、TRANK1、TRAPPC10、RPGR、COL6A2、RAPGEF4、ATG2B、CNTRL、NAA38、OSBPL10、POTEF、SCLT1、SUN1、VWDE、MTUS2和PIK3CA,主要肿瘤抑制基因包括PHEX、TP53、GGA3、RGS22、PXDNL、ARFGEF1、BRCA2、CHD8、GCC2和ARMC4。通过计算RNA测序与微阵列数据之间的相关性(r² = 0.73,P < 1E - 16)对该系统进行了验证。使用发生率约为5%的20个基因进行交叉验证证实了分析的可重复性。
我们建立了一种流程,能够在大型乳腺癌队列中对发现的突变进行快速临床验证。可通过在线界面(http://www.g - 2 - o.com)评估任何人类基因突变或最多三个此类基因的组合。