Suppr超能文献

人类大规模重复和缺失的 DNA 序列特征。

DNA sequence features underlying large-scale duplications and deletions in human.

机构信息

Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Wroclaw, Poland.

出版信息

J Appl Genet. 2022 Sep;63(3):527-533. doi: 10.1007/s13353-022-00704-0. Epub 2022 May 20.

Abstract

Copy number variants (CNVs) may cover up to 12% of the whole genome and have substantial impact on phenotypes. We used 5867 duplications and 33,181 deletions available from the 1000 Genomes Project to characterise genomic regions vulnerable to CNV formation and to identify sequence features characteristic for those regions. The GC content for deletions was lower and for duplications was higher than for randomly selected regions. In regions flanking deletions and downstream of duplications, content was higher than in the random sequences, but upstream of duplication content was lower. In duplications and downstream of deletion regions, the percentage of low-complexity sequences was not different from the randomised data. In deletions and upstream of CNVs, it was higher, while for downstream of duplications, it was lower as compared to random sequences. The majority of CNVs intersected with genic regions - mainly with introns. GC content may be associated with CNV formation and CNVs, especially duplications are initiated in low-complexity regions. Moreover, CNVs located or overlapped with introns indicate their role in shaping intron variability. Genic CNV regions were enriched in many essential biological processes such as cell adhesion, synaptic transmission, transport, cytoskeleton organization, immune response and metabolic mechanisms, which indicates that these large-scaled variants play important biological roles.

摘要

拷贝数变异(CNVs)可能覆盖整个基因组的 12%,对表型有重大影响。我们使用了 1000 基因组计划提供的 5867 个重复和 33181 个缺失,以描述易发生 CNV 形成的基因组区域,并确定这些区域的特征序列特征。缺失的 GC 含量较低,而重复的 GC 含量较高。在缺失侧翼和重复下游的区域,GC 含量高于随机序列,但在重复上游的 GC 含量较低。在重复和缺失区域下游,低复杂度序列的百分比与随机数据没有区别。在缺失和 CNV 上游,其比例高于随机序列,而在重复下游,其比例低于随机序列。大多数 CNVs 与基因区域相交,主要与内含子相交。GC 含量可能与 CNV 形成有关,而 CNVs,尤其是重复,是在低复杂度区域起始的。此外,位于内含子或与内含子重叠的 CNVs 表明其在塑造内含子变异性方面的作用。基因 CNV 区域富含许多重要的生物学过程,如细胞黏附、突触传递、运输、细胞骨架组织、免疫反应和代谢机制,这表明这些大规模的变异具有重要的生物学作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9441/9365719/c958254a99a1/13353_2022_704_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验