Detroja Rajesh, Gorohovski Alessandro, Giwa Olawumi, Baum Gideon, Frenkel-Morgenstern Milana
Cancer Genomics and BioComputing of Complex Diseases Lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel.
NAR Genom Bioinform. 2021 Nov 26;3(4):lqab112. doi: 10.1093/nargab/lqab112. eCollection 2021 Dec.
Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first 'reference-based' approach termed ChiTaH (meric rnscripts from igh-throughput sequencing data). ChiTaH uses 43,466 non-redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.
融合基因或嵌合体通常由来自两个不同基因的序列组成。此类连接序列的嵌合RNA常作为癌症驱动因子。在特定癌症或复杂疾病中识别此类驱动融合基因对于诊断和治疗至关重要。下一代测序技术(如DNA测序或RNA测序)的出现,以及合适计算工具的开发,使得在肿瘤中全面识别嵌合体成为可能。然而,对20多种计算方法的测试表明,这些方法在嵌合体预测敏感性、特异性以及连接 reads 的准确定量方面存在局限性。这些缺点促使我们开发了第一种“基于参考”的方法,称为ChiTaH(来自高通量测序数据的嵌合转录本)。ChiTaH使用43,466个非冗余的已知人类嵌合体作为参考数据库来映射测序 reads 并准确识别嵌合 reads。我们对ChiTaH和其他四种识别人类嵌合体的方法进行了基准测试,利用了模拟和真实测序数据集。结果发现,ChiTaH是从模拟和测序数据集中识别已知人类嵌合体最准确、最快的方法。此外,特别是ChiTaH揭示了K-562细胞系的批量细胞和单细胞中BCR-ABL1嵌合体的异质性,这一点得到了实验证实。