• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于基于 Jellyfish 的 LAUPs 分析应用程序 (JBLA) 的哺乳动物基因组序列的谱系相关代表性不足的排列 (LAUP)。

Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA).

机构信息

College of Computer Science, Sichuan University, Chengdu, China.

School of Computer and Information Science, Southwest University, Chongqing, China.

出版信息

Bioinformatics. 2018 Nov 1;34(21):3624-3630. doi: 10.1093/bioinformatics/bty392.

DOI:10.1093/bioinformatics/bty392
PMID:29762634
Abstract

MOTIVATION

This study addresses several important questions related to naturally underrepresented sequences: (i) are there permutations of real genomic DNA sequences in a defined length (k-mer) and a given lineage that do not actually exist or underrepresented? (ii) If there are such sequences, what are their characteristics in terms of k-mer length and base composition? (iii) Are they related to CpG or TpA underrepresentation known for human sequences? We propose that the answers to these questions are of great significance for the study of sequence-associated regulatory mechanisms, such cytosine methylation and chromosomal structures in physiological or pathological conditions such as cancer.

RESULTS

We empirically defined sequences that were not included in any well-known public databases as lineage-associated underrepresented permutations (LAUPs). Then, we developed a Jellyfish-based LAUPs analysis application (JBLA) to investigate LAUPs for 24 representative species. The present discoveries include: (i) lengths for the shortest LAUPs, ranging from 10 to 14, which collectively constitute a low proportion of the genome. (ii) Common LAUPs showing higher CG content over the analysed mammalian genome and possessing distinct CG*CG motifs. (iii) Neither CpG-containing LAUPs nor CpG island sequences are randomly structured and distributed over the genomes; some LAUPs and most CpG-containing sequences exhibit an opposite trend within the same k and n variants. In addition, we demonstrate that the JBLA algorithm is more efficient than the original Jellyfish for computing LAUPs.

AVAILABILITY AND IMPLEMENTATION

We developed a Jellyfish-based LAUP analysis (JBLA) application by integrating Jellyfish (Marçais and Kingsford, 2011), MEME (Bailey, et al., 2009) and the NCBI genome database (Pruitt, et al., 2007) applications, which are listed as Supplementary Material.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

本研究解决了与自然代表性不足序列相关的几个重要问题:(i) 在给定的谱系中,是否存在实际不存在或代表性不足的特定长度(k-mer)的真基因组 DNA 序列的排列?(ii) 如果存在这样的序列,它们在 k-mer 长度和碱基组成方面有什么特点?(iii) 它们与人类序列中已知的 CpG 或 TpA 代表性不足有关吗?我们认为,这些问题的答案对于研究与序列相关的调节机制(如胞嘧啶甲基化和生理或病理条件下的染色体结构)具有重要意义,如癌症。

结果

我们根据经验将未包含在任何已知公共数据库中的序列定义为谱系相关代表性不足的排列(LAUPs)。然后,我们开发了一个基于 Jellyfish 的 LAUPs 分析应用程序(JBLA),用于研究 24 个代表性物种的 LAUPs。本研究的发现包括:(i) 最短 LAUPs 的长度为 10 到 14,它们共同构成了基因组的一小部分。(ii) 常见的 LAUPs 在分析的哺乳动物基因组中表现出较高的 CG 含量,并具有独特的 CG*CG 基序。(iii) 既不含 CpG 的 LAUPs 也不含 CpG 岛序列是随机结构的,分布在基因组中;一些 LAUPs 和大多数含 CpG 的序列在相同的 k 和 n 变体中表现出相反的趋势。此外,我们证明了 JBLA 算法在计算 LAUPs 方面比原始 Jellyfish 更有效。

可用性和实现

我们通过整合 Jellyfish(Marçais 和 Kingsford,2011)、MEME(Bailey 等人,2009)和 NCBI 基因组数据库(Pruitt 等人,2007)应用程序,开发了一个基于 Jellyfish 的 LAUP 分析(JBLA)应用程序,这些程序列在补充材料中。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA).基于基于 Jellyfish 的 LAUPs 分析应用程序 (JBLA) 的哺乳动物基因组序列的谱系相关代表性不足的排列 (LAUP)。
Bioinformatics. 2018 Nov 1;34(21):3624-3630. doi: 10.1093/bioinformatics/bty392.
2
CGIDLA:Developing the Web Server for CpG Island Related Density and LAUPs (Lineage-Associated Underrepresented Permutations) Study.CGIDLA:开发 CpG 岛相关密度和 LAUPs(谱系相关代表性不足的排列)研究的 Web 服务器。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2148-2154. doi: 10.1109/TCBB.2019.2935971. Epub 2020 Dec 8.
3
Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.使用 K-mer 和 K-flank 模式进行比较分析为哺乳动物基因组中 CpG 岛序列的进化提供了证据。
Nucleic Acids Res. 2013 May;41(9):4783-91. doi: 10.1093/nar/gkt144. Epub 2013 Mar 21.
4
CpG-island-based annotation and analysis of human housekeeping genes.基于CpG岛的人类管家基因注释与分析
Brief Bioinform. 2021 Jan 18;22(1):515-525. doi: 10.1093/bib/bbz134.
5
DNA motifs associated with aberrant CpG island methylation.与异常CpG岛甲基化相关的DNA基序
Genomics. 2006 May;87(5):572-9. doi: 10.1016/j.ygeno.2005.12.016. Epub 2006 Feb 17.
6
GC-Profile 2.0: an extended web server for the prediction and visualization of CpG islands.GC-Profile 2.0:一个用于预测和可视化CpG岛的扩展网络服务器。
Bioinformatics. 2022 Mar 4;38(6):1738-1740. doi: 10.1093/bioinformatics/btab864.
7
Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation.将CpG岛预测为许多真核生物DNA序列中固有的聚类特性及其与DNA甲基化的关系。
Methods Mol Biol. 2018;1766:31-47. doi: 10.1007/978-1-4939-7768-0_3.
8
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.稀有k-聚体DNA:序列基序的鉴定及CpG岛和启动子的预测
J Theor Biol. 2015 Dec 21;387:88-100. doi: 10.1016/j.jtbi.2015.09.014. Epub 2015 Sep 30.
9
Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences.利用CpG侧翼序列对30种乳腺癌细胞系中的DNA甲基化易感性进行全基因组分析和建模。
J Bioinform Comput Biol. 2013 Jun;11(3):1341003. doi: 10.1142/S0219720013410035.
10
Spectrum structures and biological functions of 8-mers in the human genome.人类基因组中 8 聚体的谱结构和生物学功能。
Genomics. 2019 May;111(3):483-491. doi: 10.1016/j.ygeno.2018.03.006. Epub 2018 Mar 6.

引用本文的文献

1
Diffusion Model-Based Multi-Channel EEG Representation and Forecasting for Early Epileptic Seizure Warning.基于扩散模型的多通道脑电图表征与预测用于早期癫痫发作预警
Interdiscip Sci. 2025 Aug 11. doi: 10.1007/s12539-025-00750-2.
2
DGHNN: a deep graph and hypergraph neural network for pan-cancer related gene prediction.DGHNN:一种用于泛癌相关基因预测的深度图与超图神经网络
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf379.
3
Developing a multiomics data-based mathematical model to predict colorectal cancer recurrence and metastasis.
开发一种基于多组学数据的数学模型以预测结直肠癌的复发和转移。
BMC Med Inform Decis Mak. 2025 May 15;25(Suppl 2):188. doi: 10.1186/s12911-025-03012-9.
4
LncRNA Recognition-Associated CpG Island Detection and Methylation Analysis.长链非编码RNA识别相关CpG岛检测与甲基化分析
Methods Mol Biol. 2025;2883:281-297. doi: 10.1007/978-1-0716-4290-0_12.
5
A comprehensive review of artificial intelligence for pharmacology research.药理学研究中人工智能的全面综述。
Front Genet. 2024 Sep 3;15:1450529. doi: 10.3389/fgene.2024.1450529. eCollection 2024.
6
CpG Island Definition and Methylation Mapping of the T2T-YAO Genome.CpG 岛定义和 T2T-YAO 基因组的甲基化图谱。
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae009.
7
ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model.ConvNeXt-MHC:通过结构衍生的简并编码和 ConvNeXt 模型提高 MHC-肽亲和力预测。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae133.
8
PCGIMA: developing the web server for human position-defined CpG islands methylation analysis.PCGIMA:开发用于人类位置定义的CpG岛甲基化分析的网络服务器。
Front Genet. 2024 Mar 13;15:1367731. doi: 10.3389/fgene.2024.1367731. eCollection 2024.
9
A Review of the Application of Spatial Transcriptomics in Neuroscience.空间转录组学在神经科学中的应用综述。
Interdiscip Sci. 2024 Jun;16(2):243-260. doi: 10.1007/s12539-024-00603-4. Epub 2024 Feb 20.
10
Discovering hematoma-stimulated circuits for secondary brain injury after intraventricular hemorrhage by spatial transcriptome analysis.通过空间转录组分析发现脑室内出血后血肿刺激的继发性脑损伤回路。
Front Immunol. 2023 Feb 7;14:1123652. doi: 10.3389/fimmu.2023.1123652. eCollection 2023.