Tumescheit Charlotte, Firth Andrew E, Brown Katherine
Department of Pathology, University of Cambridge, Cambridge, United Kingdom.
PeerJ. 2022 Mar 15;10:e12983. doi: 10.7717/peerj.12983. eCollection 2022.
Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce.
We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user.
CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
在整个生物学领域,多序列比对(MSA)构成了对生物特征和关系进行大量研究的基础。这些比对是许多生物信息学分析的核心。然而,MSA中的序列往往不完整或差异很大,这可能导致比对效果不佳和出现大的缺口。这会减慢计算速度,并可能在与生物学无关的情况下影响结论。通过去除诸如缺口、差异序列、大的插入和缺失以及比对不佳的序列末端等常见问题来清理比对,可以显著改善分析。手动编辑MSA非常普遍,但既耗时又难以重现。
我们展示了一个具有多种可视化选项的全面、用户友好的MSA修剪工具。我们高度可定制的命令行工具旨在通过提供各种选项赋予用户干预能力,并输出处理前后比对的图形表示,以便用户清楚了解已去除的内容。该工具的主要功能包括去除因插入导致的低覆盖率区域、去除缺口、裁剪比对不佳的序列末端以及去除差异过大或过短的序列。每个功能的阈值可由用户指定,参数可针对每个单独的MSA进行调整。CIAlign的设计重点在于解决特定和常见的比对问题,并为用户提供透明度。
CIAlign有效地从MSA中去除有问题的区域和序列,并提供新颖的可视化选项。该工具可用于微调比对以进行进一步分析和处理。该工具面向任何希望自动清理MSA部分内容的人以及那些需要一种新的、易于使用的方式来可视化大型MSA的人。