• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MAFin:多序列比对文件中的基序检测

MAFin: motif detection in multiple alignment files.

作者信息

Patsakis Michail, Provatas Kimonas, Baltoumas Fotis A, Chantzi Nikol, Mouratidis Ioannis, Pavlopoulos Georgios A, Georgakopoulos-Soares Ilias

机构信息

Institute for Personalized Medicine, Department of Molecular and Precision Medicine, The Pennsylvania State University College of Medicine, Hershey, PA 17033, United States.

Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, United States.

出版信息

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf125.

DOI:10.1093/bioinformatics/btaf125
PMID:40106711
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11978385/
Abstract

MOTIVATION

Whole Genome and Proteome Alignments, represented by the multiple alignment file format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.

RESULTS

We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: (i) using user-specified k-mers to search the sequences. (ii) with regular expressions, in which case one or more patterns are searched, and (iii) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.

AVAILABILITY AND IMPLEMENTATION

MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.

摘要

动机

以多重比对文件格式表示的全基因组和蛋白质组比对已成为比较基因组学和蛋白质组学中的标准方法。这些方法通常需要识别保守基序,这对于理解功能和进化关系至关重要。然而,目前的方法缺乏在MAF文件中直接检测基序的方法。我们提出了MAFin,这是一种新颖的工具,能够在MAF文件中进行高效的基序检测和保守性分析,以填补这一空白,简化基因组学和蛋白质组学研究。

结果

我们开发了MAFin,这是首个用于多重比对格式文件的基序检测工具。MAFin能够使用三种方法对保守基序进行多线程搜索:(i)使用用户指定的k-mer搜索序列。(ii)使用正则表达式,在这种情况下搜索一个或多个模式,以及(iii)使用预定义的位置权重矩阵。一旦找到基序,MAFin会检测基序实例并计算比对序列中的保守性。MAFin还会计算保守百分比,该百分比基于与基序长度相关的匹配数,提供有关每个基序在比对序列中的保守水平的信息。一组统计数据能够解释每个基序的保守水平,并且检测到的基序会以JSON和CSV文件形式导出,以便进行下游分析。

可用性和实现方式

MAFin作为一个遵循GPL许可的Python包提供,是一个多平台应用程序,可在以下网址获取:https://github.com/Georgakopoulos-Soares-lab/MAFin。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/872e/11978385/27ca8a10f8cd/btaf125f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/872e/11978385/27ca8a10f8cd/btaf125f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/872e/11978385/27ca8a10f8cd/btaf125f1.jpg

相似文献

1
MAFin: motif detection in multiple alignment files.MAFin:多序列比对文件中的基序检测
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf125.
2
MAFin: Motif Detection in Multiple Alignment Files.MAFin:多序列比对文件中的基序检测
ArXiv. 2024 Oct 14:arXiv:2410.11021v1.
3
MAFcounter: An efficient tool for counting the occurrences of k-mers in MAF files.MAFcounter:一种用于统计MAF文件中k-mer出现次数的高效工具。
ArXiv. 2024 Nov 29:arXiv:2411.19427v1.
4
PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions.PairK:用于量化无序区域中蛋白质基序保守性的成对k-mer比对
Protein Sci. 2025 Jan;34(1):e70004. doi: 10.1002/pro.70004.
5
Motif scraper: a cross-platform, open-source tool for identifying degenerate nucleotide motif matches in FASTA files.基序提取器:一个跨平台的、开源的工具,用于在 FASTA 文件中识别简并核苷酸基序匹配。
Bioinformatics. 2018 Nov 15;34(22):3926-3928. doi: 10.1093/bioinformatics/bty437.
6
BlockLogo: visualization of peptide and sequence motif conservation.BlockLogo:展示肽和序列模体保守性。
J Immunol Methods. 2013 Dec 31;400-401:37-44. doi: 10.1016/j.jim.2013.08.014. Epub 2013 Aug 31.
7
CompariMotif: quick and easy comparisons of sequence motifs.CompariMotif:序列基序的快速简便比较。
Bioinformatics. 2008 May 15;24(10):1307-9. doi: 10.1093/bioinformatics/btn105. Epub 2008 Mar 28.
8
AL2CO: calculation of positional conservation in a protein sequence alignment.AL2CO:蛋白质序列比对中位置保守性的计算
Bioinformatics. 2001 Aug;17(8):700-12. doi: 10.1093/bioinformatics/17.8.700.
9
NestedMICA as an ab initio protein motif discovery tool.NestedMICA作为一种从头开始的蛋白质基序发现工具。
BMC Bioinformatics. 2008 Jan 14;9:19. doi: 10.1186/1471-2105-9-19.
10
Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.通过隐马尔可夫模型的蒙特卡罗优化实现蛋白质序列基序的间隙比对。
BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.

引用本文的文献

1
Landscape and mutational dynamics of G-quadruplexes in the complete human genome and in haplotypes of diverse ancestry.人类全基因组及不同祖先单倍型中G-四链体的景观与突变动态。
bioRxiv. 2025 Jun 25:2025.06.17.660256. doi: 10.1101/2025.06.17.660256.

本文引用的文献

1
NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.美国国立生物技术信息中心参考序列:历经25年整理与注释的参考序列标准。
Nucleic Acids Res. 2025 Jan 6;53(D1):D243-D257. doi: 10.1093/nar/gkae1038.
2
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
3
JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles.
JASPAR 2024:转录因子结合谱开放获取数据库的 20 周年纪念
Nucleic Acids Res. 2024 Jan 5;52(D1):D174-D182. doi: 10.1093/nar/gkad1059.
4
The UCSC Genome Browser database: 2024 update.UCSC 基因组浏览器数据库:2024 年更新。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1082-D1088. doi: 10.1093/nar/gkad987.
5
Transcription factor binding site orientation and order are major drivers of gene regulatory activity.转录因子结合位点的方向和顺序是基因调控活性的主要驱动因素。
Nat Commun. 2023 Apr 22;14(1):2333. doi: 10.1038/s41467-023-37960-5.
6
The IMG/M data management and analysis system v.7: content updates and new features.IMG/M 数据管理与分析系统 v.7:内容更新与新特性。
Nucleic Acids Res. 2023 Jan 6;51(D1):D723-D732. doi: 10.1093/nar/gkac976.
7
Sequence locally, think globally: The Darwin Tree of Life Project.就地测序,放眼全球:达尔文生命之树计划。
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2115642118.
8
Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。
Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.
9
Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ.使用 SibeliaZ 进行可扩展的多基因组全序列比对和局部共线性块构建。
Nat Commun. 2020 Dec 10;11(1):6327. doi: 10.1038/s41467-020-19777-8.
10
Progressive Cactus is a multiple-genome aligner for the thousand-genome era.渐进仙人掌是一个适用于千基因组时代的多基因组比对工具。
Nature. 2020 Nov;587(7833):246-251. doi: 10.1038/s41586-020-2871-y. Epub 2020 Nov 11.