Suppr超能文献

人类基因组中 G-四链体家族的结构和功能分类。

Structural and Functional Classification of G-Quadruplex Families within the Human Genome.

机构信息

School of Graduate and Interdisciplinary Studies, University of Louisville, Louisville, KY 40292, USA.

Department of Neuroscience Training, University of Louisville, Louisville, KY 40292, USA.

出版信息

Genes (Basel). 2023 Mar 4;14(3):645. doi: 10.3390/genes14030645.

Abstract

G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar sequence, structure, and thermodynamic properties are known to exist. We present a framework for clustering G-quadruplex sequences into families using the CD-HIT, MeShClust, and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G-quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for the identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms, and transcription factor binding motifs within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G-quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster.

摘要

四链体(G4s)是位于基因组 DNA 和转录 RNA 中的短二级 DNA 结构。尽管已经证明 G4 结构在体内形成,但目前还没有基于先前鉴定的 G-四链体并根据类似的序列、结构和热力学性质对其进行筛选的搜索工具。我们提出了一种使用 CD-HIT、MeShClust 和 DNACLUST 方法以及 Starcode 和 BLAST 的组合将 G-四链体序列聚类成家族的框架。利用该框架进行过滤和注释聚类,在人类基因组中鉴定出 95 个 G-四链体序列家族。使用隐马尔可夫模型为每个家族创建了概况,以允许识别其他家族成员并生成同源概率得分。使用热力学折叠能量特性、与序列相关的基因的功能注释、来自不同预测算法的分数以及家族内的转录因子结合基序对家族内和跨家族进行注释和比较多样性。所得的 G-四链体家族集可用于进一步了解基因组的不同区域如何被针对特定簇成员共有的特定结构的因素所调节。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a470/10048163/e620463b7c11/genes-14-00645-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验