Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.
Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac222.
The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).
多序列比对(MSA)中保守区和可变区的鉴定对于加速理解基因功能的过程至关重要。MSA 可视化允许我们将序列特征转化为可理解的视觉表示。随着序列-结构-功能关系在分子生物学研究中受到越来越多的关注,简单地显示核苷酸或蛋白质序列比对已经不能满足需求。需要更具可扩展性的可视化来拓宽序列研究的范围。在这里,我们展示了 ggmsa,这是一个用于挖掘综合序列特征并通过各种显示方法整合 MSA 相关数据的 R 包。为了揭示序列保守模式、位点水平的变异和重组、序列捆绑、序列 logo、堆叠序列比对和比较图,ggmsa 实现了这些功能。ggmsa 支持整合 MSA 序列与其表型的相关性,以及其他特征,如祖先序列、分子结构、分子功能和表达水平。我们还设计了一种新的可视化方法,用于以多重比对格式进行基因组比对,以探索物种内和物种间变异的模式。将这些可视化表示与主要知识相结合,ggmsa 可帮助研究人员发现 MSA 并做出决策。ggmsa 包是一个开源软件,根据 Artistic-2.0 许可证发布,可在 Bioconductor(https://bioconductor.org/packages/ggmsa)和 Github(https://github.com/YuLab-SMU/ggmsa)上免费获得。