The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
Division of Hematology, Oncology, Blood and Marrow Transplant, Nationwide Children's Hospital, Columbus, OH, USA.
BMC Genomics. 2021 Dec 4;22(1):872. doi: 10.1186/s12864-021-08094-z.
Pediatric cancers typically have a distinct genomic landscape when compared to adult cancers and frequently carry somatic gene fusion events that alter gene expression and drive tumorigenesis. Sensitive and specific detection of gene fusions through the analysis of next-generation-based RNA sequencing (RNA-Seq) data is computationally challenging and may be confounded by low tumor cellularity or underlying genomic complexity. Furthermore, numerous computational tools are available to identify fusions from supporting RNA-Seq reads, yet each algorithm demonstrates unique variability in sensitivity and precision, and no clearly superior approach currently exists. To overcome these challenges, we have developed an ensemble fusion calling approach to increase the accuracy of identifying fusions.
Our Ensemble Fusion (EnFusion) approach utilizes seven fusion calling algorithms: Arriba, CICERO, FusionMap, FusionCatcher, JAFFA, MapSplice, and STAR-Fusion, which are packaged as a fully automated pipeline using Docker and Amazon Web Services (AWS) serverless technology. This method uses paired end RNA-Seq sequence reads as input, and the output from each algorithm is examined to identify fusions detected by a consensus of at least three algorithms. These consensus fusion results are filtered by comparison to an internal database to remove likely artifactual fusions occurring at high frequencies in our internal cohort, while a "known fusion list" prevents failure to report known pathogenic events. We have employed the EnFusion pipeline on RNA-Seq data from 229 patients with pediatric cancer or blood disorders studied under an IRB-approved protocol. The samples consist of 138 central nervous system tumors, 73 solid tumors, and 18 hematologic malignancies or disorders. The combination of an ensemble fusion-calling pipeline and a knowledge-based filtering strategy identified 67 clinically relevant fusions among our cohort (diagnostic yield of 29.3%), including RBPMS-MET, BCAN-NTRK1, and TRIM22-BRAF fusions. Following clinical confirmation and reporting in the patient's medical record, both known and novel fusions provided medically meaningful information.
The EnFusion pipeline offers a streamlined approach to discover fusions in cancer, at higher levels of sensitivity and accuracy than single algorithm methods. Furthermore, this method accurately identifies driver fusions in pediatric cancer, providing clinical impact by contributing evidence to diagnosis and, when appropriate, indicating targeted therapies.
与成人癌症相比,儿科癌症通常具有独特的基因组特征,并且经常携带体细胞基因融合事件,这些事件改变基因表达并驱动肿瘤发生。通过下一代 RNA 测序(RNA-Seq)数据分析敏感和特异性检测基因融合具有计算挑战性,并且可能会受到肿瘤细胞含量低或潜在基因组复杂性的干扰。此外,有许多计算工具可用于从支持 RNA-Seq 读取中识别融合,但每种算法在灵敏度和精度方面都表现出独特的可变性,目前尚无明显的优势方法。为了克服这些挑战,我们开发了一种集成融合调用方法来提高识别融合的准确性。
我们的 Ensemble Fusion(EnFusion)方法利用七种融合调用算法:Arriba、CICERO、FusionMap、FusionCatcher、JAFFA、MapSplice 和 STAR-Fusion,这些算法被打包为一个使用 Docker 和 Amazon Web Services(AWS)无服务器技术的全自动流水线。该方法使用配对末端 RNA-Seq 序列读取作为输入,检查每个算法的输出以识别至少三个算法共识检测到的融合。这些共识融合结果通过与内部数据库进行比较来过滤,以去除在我们内部队列中高频发生的可能人为融合,而“已知融合列表”可防止未能报告已知的致病性事件。我们已经在根据 IRB 批准的协议研究的 229 名儿科癌症或血液疾病患者的 RNA-Seq 数据上使用了 EnFusion 管道。这些样本包括 138 例中枢神经系统肿瘤、73 例实体瘤和 18 例血液系统恶性肿瘤或疾病。集成融合调用管道和基于知识的过滤策略的组合在我们的队列中鉴定出 67 种临床相关融合(诊断率为 29.3%),包括 RBPMS-MET、BCAN-NTRK1 和 TRIM22-BRAF 融合。在临床确认并在患者病历中报告后,已知和新融合都提供了有意义的医学信息。
EnFusion 管道提供了一种简化的方法来发现癌症中的融合,比单算法方法具有更高的灵敏度和准确性。此外,该方法准确识别儿科癌症中的驱动融合,通过为诊断提供证据并在适当情况下指示靶向治疗来产生临床影响。