• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于可视化DNA序列的-mer流形逼近与投影

-mer manifold approximation and projection for visualizing DNA sequences.

作者信息

Fu Chengbo, Niskanen Einari A, Wei Gong-Hong, Yang Zhirong, Sanvicente-García Marta, Güell Marc, Cheng Lu

机构信息

Department of Computer Science, School of Science, Aalto University, 02150 Espoo, Finland.

Institute of Biomedicine, University of Eastern Finland, 70211 Kuopio, Finland.

出版信息

Genome Res. 2025 May 2;35(5):1234-1246. doi: 10.1101/gr.279458.124.

DOI:10.1101/gr.279458.124
PMID:40210440
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12047656/
Abstract

Identifying and illustrating patterns in DNA sequences are crucial tasks in various biological data analyses. In this task, patterns are often represented by sets of -mers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, one could project each -mer onto a point in two-dimensional (2D) space. However, this projection poses challenges owing to the high-dimensional nature of -mers and their unique mathematical properties. Here, we establish a mathematical system to address the peculiarities of the -mer manifold. Leveraging this -mer manifold theory, we develop a statistical method named KMAP for detecting -mer patterns and visualizing them in 2D space. We applied KMAP to three distinct data sets to showcase its utility. KMAP achieves a comparable performance to the classical method MEME, with ∼90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing sarcoma (EWS), we find that BACH1, OTX2, and KNCH2 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also observe potential colocalization of BACH1, OTX2, and the motif CCCAGGCTGGAGTGC in ∼70 bp windows in the enhancer regions. Furthermore, we find that FLI1 binds to the enhancer regions after ETV6 degradation, indicating competitive binding between ETV6 and FLI1. Moreover, KMAP identifies four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP can be a valuable tool across various biological contexts.

摘要

识别和阐释DNA序列中的模式是各种生物数据分析中的关键任务。在这项任务中,模式通常由 - 聚体集合表示,- 聚体是DNA序列的基本构建块。为了直观地揭示这些模式,可以将每个 - 聚体投影到二维(2D)空间中的一个点上。然而,由于 - 聚体的高维性质及其独特的数学特性,这种投影带来了挑战。在这里,我们建立了一个数学系统来解决 - 聚体流形的特殊性。利用这种 - 聚体流形理论,我们开发了一种名为KMAP的统计方法,用于检测 - 聚体模式并在2D空间中进行可视化。我们将KMAP应用于三个不同的数据集以展示其效用。KMAP与经典方法MEME的性能相当,从HT - SELEX数据中发现基序的相似度约为90%。在对尤因肉瘤(EWS)的H3K27ac ChIP - seq数据的分析中,我们发现BACH1、OTX2和KNCH2可能通过结合全基因组的启动子和增强子区域来影响EWS的预后。我们还观察到在增强子区域约70 bp的窗口中,BACH1、OTX2和基序CCCAGGCTGGAGTGC存在潜在的共定位。此外,我们发现ETV6降解后FLI1与增强子区域结合,表明ETV6和FLI1之间存在竞争性结合。此外,KMAP在AAVS1位点的基因编辑数据中识别出四种普遍模式,与文献报道的结果一致。这些应用强调了KMAP在各种生物学背景下都可以成为一种有价值的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/218c293769a4/1234f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/60bd9143933e/1234f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/270dae528246/1234f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/5f660904e411/1234f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/b024fef28f43/1234f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/218c293769a4/1234f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/60bd9143933e/1234f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/270dae528246/1234f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/5f660904e411/1234f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/b024fef28f43/1234f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa30/12047656/218c293769a4/1234f05.jpg

相似文献

1
-mer manifold approximation and projection for visualizing DNA sequences.用于可视化DNA序列的-mer流形逼近与投影
Genome Res. 2025 May 2;35(5):1234-1246. doi: 10.1101/gr.279458.124.
2
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.稀有k-聚体DNA:序列基序的鉴定及CpG岛和启动子的预测
J Theor Biol. 2015 Dec 21;387:88-100. doi: 10.1016/j.jtbi.2015.09.014. Epub 2015 Sep 30.
3
ETV6 dependency in Ewing sarcoma by antagonism of EWS-FLI1-mediated enhancer activation.EWS-FLI1 介导的增强子激活的拮抗作用导致尤文肉瘤中 ETV6 的依赖性。
Nat Cell Biol. 2023 Feb;25(2):298-308. doi: 10.1038/s41556-022-01060-1. Epub 2023 Jan 19.
4
K-mer-Based Motif Analysis in Insect Species across , , and Genera and Its Application to Species Classification.基于 K- -mer 的昆虫种、属和科的基序分析及其在物种分类中的应用。
Comput Math Methods Med. 2019 Nov 15;2019:4259479. doi: 10.1155/2019/4259479. eCollection 2019.
5
EWS-FLI1 regulates a transcriptional program in cooperation with Foxq1 in mouse Ewing sarcoma.EWS-FLI1 通过与 Foxq1 合作在小鼠尤文肉瘤中调节转录程序。
Cancer Sci. 2018 Sep;109(9):2907-2918. doi: 10.1111/cas.13710. Epub 2018 Jul 18.
6
Effective sequence similarity detection with strobemers.利用频闪体进行有效的序列相似性检测。
Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.
7
(GGAA)-Based TF-PROTACs Enable Targeted Degradation of ETV6 to Inhibit Ewing Sarcoma Growth.基于(GGAA)的TF-PROTAC能够靶向降解ETV6以抑制尤因肉瘤生长。
J Am Chem Soc. 2025 Apr 23;147(16):13396-13404. doi: 10.1021/jacs.4c18484. Epub 2025 Apr 11.
8
Fast Approximation of Frequent -Mers and Applications to Metagenomics.频繁短序列模式的快速近似算法及其在宏基因组学中的应用
J Comput Biol. 2020 Apr;27(4):534-549. doi: 10.1089/cmb.2019.0314. Epub 2019 Dec 20.
9
SLFN11 Is a Transcriptional Target of EWS-FLI1 and a Determinant of Drug Response in Ewing Sarcoma.SLFN11是EWS-FLI1的转录靶点,也是尤因肉瘤药物反应的决定因素。
Clin Cancer Res. 2015 Sep 15;21(18):4184-93. doi: 10.1158/1078-0432.CCR-14-2112. Epub 2015 Mar 16.
10
The oncogenic EWS-FLI1 protein binds in vivo GGAA microsatellite sequences with potential transcriptional activation function.致癌性EWS-FLI1蛋白在体内与具有潜在转录激活功能的GGAA微卫星序列结合。
PLoS One. 2009;4(3):e4932. doi: 10.1371/journal.pone.0004932. Epub 2009 Mar 23.

本文引用的文献

1
CRISPR-Analytics (CRISPR-A): A platform for precise analytics and simulations for gene editing.CRISPR-Analytics(CRISPR-A):一个用于基因编辑的精确分析和模拟的平台。
PLoS Comput Biol. 2023 May 30;19(5):e1011137. doi: 10.1371/journal.pcbi.1011137. eCollection 2023 May.
2
The ETS transcription factor ETV6 constrains the transcriptional activity of EWS-FLI to promote Ewing sarcoma.ETS 转录因子 ETV6 限制 EWS-FLI 的转录活性,从而促进尤因肉瘤的发生。
Nat Cell Biol. 2023 Feb;25(2):285-297. doi: 10.1038/s41556-022-01059-8. Epub 2023 Jan 19.
3
ETV6 dependency in Ewing sarcoma by antagonism of EWS-FLI1-mediated enhancer activation.
EWS-FLI1 介导的增强子激活的拮抗作用导致尤文肉瘤中 ETV6 的依赖性。
Nat Cell Biol. 2023 Feb;25(2):298-308. doi: 10.1038/s41556-022-01060-1. Epub 2023 Jan 19.
4
BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin.BindVAE:可及染色质从头发现基序的 Dirichlet 变分自动编码器。
Genome Biol. 2022 Aug 15;23(1):174. doi: 10.1186/s13059-022-02723-w.
5
ggmsa: a visual exploration tool for multiple sequence alignment and associated data.ggmsa:一个用于多序列比对及相关数据的可视化探索工具。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac222.
6
JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022:转录因子结合谱开放获取数据库的第 9 个版本。
Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.
7
STREME: accurate and versatile sequence motif discovery.STREME:准确且通用的序列基序发现。
Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.
8
Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis.Kssd:通过 K-mer 子串空间采样进行序列降维,实现实时大规模数据集分析。
Genome Biol. 2021 Mar 16;22(1):84. doi: 10.1186/s13059-021-02303-4.
9
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
10
Logomaker: beautiful sequence logos in Python.Logomaker:用 Python 绘制优美的序列 logo。
Bioinformatics. 2020 Apr 1;36(7):2272-2274. doi: 10.1093/bioinformatics/btz921.