Patsakis Michail, Provatas Kimonas, Baltoumas Fotis A, Chantzi Nikol, Mouratidis Ioannis, Pavlopoulos Georgios A, Georgakopoulos-Soares Ilias
Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
ArXiv. 2024 Oct 14:arXiv:2410.11021v1.
Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in the field of comparative genomics and proteomics. However, current approaches lack a direct method for motif detection within MAF files. To address this gap, we present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files, streamlining genomic and proteomic research.
We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: 1) by using user-specified k-mers to search the sequences. 2) with regular expressions, in which case one or more patterns are searched, and 3) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enable the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.
MAFin is released as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.
以多重比对文件(MAF)格式表示的基因组和蛋白质组比对,已成为比较基因组学和蛋白质组学领域的标准方法。然而,目前的方法缺乏在MAF文件中进行基序检测的直接方法。为了填补这一空白,我们提出了MAFin,这是一种新颖的工具,能够在MAF文件中进行高效的基序检测和保守性分析,简化基因组和蛋白质组研究。
我们开发了MAFin,这是首个用于多重比对格式文件的基序检测工具。MAFin能够使用三种方法进行保守基序的多线程搜索:1)通过使用用户指定的k-mer搜索序列。2)使用正则表达式,在这种情况下搜索一个或多个模式,以及3)使用预定义的位置权重矩阵。一旦找到基序,MAFin会检测基序实例并计算比对序列间的保守性。MAFin还会计算保守百分比,该百分比基于相对于基序长度的匹配数,提供关于每个基序在比对序列中的保守水平的信息。一组统计数据有助于解释每个基序的保守水平,并且检测到的基序会导出到JSON和CSV文件中以供下游分析。
MAFin作为一个遵循GPL许可的Python包发布,是一个多平台应用程序,可在以下网址获取:https://github.com/Georgakopoulos-Soares-lab/MAFin。