Sakai Ryo, Aerts Jan
Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium ; iMinds Medical IT, Leuven, Belgium.
BMC Proc. 2014 Aug 28;8(Suppl 2 Proceedings of the 3rd Annual Symposium on Biologica):S9. doi: 10.1186/1753-6561-8-S2-S9. eCollection 2014.
The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study.
Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization.
The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.
序列图是一组比对序列的图形表示,常用于描绘氨基酸或核苷酸序列的保守性。尽管它能有效地传达每个位置的信息量,但当任务是比较两组或更多组比对序列时,这种可视化表示就显得不足。我们提出了一种名为序列多样性图的新可视化表示,并通过一个案例研究验证了我们的设计选择。
我们的软件是使用名为Processing的开源程序开发的。它加载多个序列比对FASTA文件和一个配置文件,可根据需要进行修改以更改可视化效果。
重新设计的图形改进了两组或更多组的视觉比较,并且还编码了序列位置保守性的信息。在我们对腺苷酸激酶盖子结构域的案例研究中,序列多样性图揭示了意想不到的模式和新见解,例如在蛋白质亚家族中识别出亚组。我们未来的工作将把这种可视化编码集成到交互式可视化工具中,以支持更高级别的数据探索任务。