Paciello Giulia, Ficarra Elisa
Department of Control and Computer Engineering DAUIN, Politecnico di Torino, C.so Duca degli Abruzzi 24, Turin, 10129, Italy.
BMC Bioinformatics. 2017 Jan 23;18(1):58. doi: 10.1186/s12859-016-1450-6.
Latest Next Generation Sequencing technologies opened the way to a novel era of genomic studies, allowing to gain novel insights into multifactorial pathologies as cancer. In particular gene fusion detection and comprehension have been deeply enhanced by these methods. However, state of the art algorithms for gene fusion identification are still challenging. Indeed, they identify huge amounts of poorly overlapping candidates and all the reported fusions should be considered for in lab validation clearly overwhelming wet lab capabilities.
In this work we propose a novel methodological approach and tool named FuGePrior for the prioritization of gene fusions from paired-end RNA-Seq data. The proposed pipeline combines state of the art tools for chimeric transcript discovery and prioritization, a series of filtering and processing steps designed by considering modern literature on gene fusions and an analysis on functional reliability of gene fusion structure.
FuGePrior performance has been assessed on two publicly available paired-end RNA-Seq datasets: The first by Edgren and colleagues includes four breast cancer cell lines and a normal breast sample, whereas the second by Ren and colleagues comprises fourteen primary prostate cancer samples and their paired normal counterparts. FuGePrior results accounted for a reduction in the number of fusions output of chimeric transcript discovery tools that ranges from 65 to 75% depending on the considered breast cancer cell line and from 37 to 65% according to the prostate cancer sample under examination. Furthermore, since both datasets come with a partial validation we were able to assess the performance of FuGePrior in correctly prioritizing real gene fusions. Specifically, 25 out of 26 validated fusions in breast cancer dataset have been correctly labelled as reliable and biologically significant. Similarly, 2 out of 5 validated fusions in prostate dataset have been recognized as priority by FuGePrior tool.
最新的下一代测序技术开启了基因组研究的新时代,使人们能够对癌症等多因素疾病有新的认识。特别是这些方法极大地增强了基因融合检测和理解。然而,用于基因融合识别的现有算法仍然具有挑战性。事实上,它们识别出大量重叠性很差的候选基因,并且所有报告的融合都应在实验室验证中予以考虑,这显然超出了湿实验室的能力。
在这项工作中,我们提出了一种名为FuGePrior的新方法和工具,用于从双末端RNA测序数据中对基因融合进行优先级排序。所提出的流程结合了用于嵌合转录本发现和优先级排序的现有工具、一系列通过考虑关于基因融合的现代文献设计的过滤和处理步骤以及对基因融合结构功能可靠性的分析。
FuGePrior的性能已在两个公开可用的双末端RNA测序数据集上进行了评估:第一个由埃德格伦及其同事提供,包括四个乳腺癌细胞系和一个正常乳腺样本,而第二个由任及其同事提供,包括14个原发性前列腺癌样本及其配对的正常样本。FuGePrior的结果表明,嵌合转录本发现工具输出的融合数量有所减少,根据所考虑的乳腺癌细胞系,减少范围为65%至75%,根据所检测的前列腺癌样本,减少范围为37%至65%。此外,由于这两个数据集都有部分验证,我们能够评估FuGePrior在正确对真实基因融合进行优先级排序方面的性能。具体而言,乳腺癌数据集中26个经过验证的融合中有25个被正确标记为可靠且具有生物学意义。同样,在前列腺数据集中5个经过验证的融合中有2个被FuGePrior工具识别为优先级。