School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
School of Public Health Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad314.
Microhomology-mediated end joining (MMEJ), an error-prone DNA damage repair mechanism, frequently leads to chromosomal rearrangements due to its ability to engage in promiscuous end joining of genomic instability and also leads to increasing mutational load at the sequences flanking the breakpoints (BPs). In this study, we systematically investigated the homology sequences around the genomic breakpoint area of human fusion genes, which were formed by the chromosomal rearrangements initiated by DNA double-strand breakage. Since the RNA-seq data is the typical data set to check the fusion genes, for the known exon junction fusion breakpoints identified from RNA-seq data, we have to infer the high chance of genomic breakpoint regions. For this, we utilized the high feature importance score area calculated from our recently developed fusion BP prediction model, FusionAI and identified 151 K microhomologies among ~24 K fusion BPs in 20 K fusion genes. From our multiple bioinformatics studies, we found a relationship between sequence homologies and the immune system. This in-silico study will provide novel knowledge on the sequence homologies around the coded structural variants.
微同源介导的末端连接 (MMEJ) 是一种易错的 DNA 损伤修复机制,由于其能够进行基因组不稳定性的混杂末端连接,因此经常导致染色体重排,并且还会导致断裂点 (BP) 侧翼序列的突变负荷增加。在这项研究中,我们系统地研究了由 DNA 双链断裂引发的染色体重排形成的人类融合基因基因组断裂区域周围的同源序列。由于 RNA-seq 数据是检查融合基因的典型数据集,对于从 RNA-seq 数据中确定的已知外显子连接融合断点,我们必须推断出基因组断点区域的高可能性。为此,我们利用我们最近开发的融合 BP 预测模型 FusionAI 计算出的高特征重要性得分区域,在 20K 个融合基因的约 24K 个融合 BP 中鉴定出 151K 个微同源。通过我们的多项生物信息学研究,我们发现了序列同源性与免疫系统之间的关系。这项计算机研究将为编码结构变异体周围的序列同源性提供新的知识。