Subramanian Sandeep, Chaparala Srilakshmi, Avali Viji, Ganapathiraju Madhavi K
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, Suite 522, Pittsburgh, PA, 15206, USA.
BMC Med Genomics. 2016 Dec 5;9(Suppl 3):73. doi: 10.1186/s12920-016-0232-3.
DNA palindromes are a unique pattern of repeat sequences that are present in the human genome. It consists of a sequence of nucleotides in which the second half is the complement of the first half but appearing in reverse order. These palindromic sequences may have a significant role in DNA replication, transcription and gene regulation processes. They occur frequently in human cancers by clustering at specific locations of the genome that undergo gene amplification and tumorigenesis. Moreover, some studies showed that palindromes are clustered in amplified regions of breast cancer genomes especially in chromosomes (chr) 8 and 11. With the large number of personal genomes and cancer genomes becoming available, it is now possible to study their association to diseases using computational methods. Here, we conducted a pilot study on chromosomes 8 and 11 of cancer genomes to identify computationally the differentially occurring palindromes.
We processed 69 breast cancer genomes from The Cancer Genome Atlas including serum-normal and tumor genomes, and 1000 Genomes to serve as control group. The Biological Language Modelling Toolkit (BLMT) computes palindromes in whole genomes. We developed a computational pipeline integrating BLMT to compute and compare prevalence of palindromes in personal genomes.
We carried out a pilot study on chr 8 and chr 11 taking into account single nucleotide polymorphisms, insertions and deletions. Of all the palindromes that showed any variation in cancer genomes, 38% of what were near breast cancer genes happened to be the most differentiated palindromes in tumor (i.e. they ranked among the top 25% by our heuristic measure).
These results will shed light on the prevalence of palindromes in oncogenes and the mutations that are present in the palindromic regions that could contribute to genomic rearrangements, and breast cancer progression.
DNA回文序列是人类基因组中存在的一种独特的重复序列模式。它由一系列核苷酸组成,其中后半部分是前半部分的互补序列,但顺序相反。这些回文序列可能在DNA复制、转录和基因调控过程中发挥重要作用。它们通过聚集在经历基因扩增和肿瘤发生的基因组特定位置而频繁出现在人类癌症中。此外,一些研究表明,回文序列聚集在乳腺癌基因组的扩增区域,尤其是在8号和11号染色体上。随着大量个人基因组和癌症基因组的可得性增加,现在可以使用计算方法研究它们与疾病的关联。在这里,我们对癌症基因组的8号和11号染色体进行了一项初步研究,以通过计算识别差异出现的回文序列。
我们处理了来自癌症基因组图谱的69个乳腺癌基因组,包括血清正常和肿瘤基因组,以及1000个基因组作为对照组。生物语言建模工具包(BLMT)计算全基因组中的回文序列。我们开发了一个整合BLMT的计算流程,以计算和比较个人基因组中回文序列的发生率。
我们在考虑单核苷酸多态性、插入和缺失的情况下,对8号和11号染色体进行了初步研究。在所有在癌症基因组中显示出任何变异的回文序列中,靠近乳腺癌基因的回文序列中有38%恰好是肿瘤中最具差异的回文序列(即根据我们的启发式测量,它们排名在前25%)。
这些结果将阐明癌基因中回文序列的发生率以及回文区域中可能导致基因组重排和乳腺癌进展的突变。