• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ClustalXeed:一个基于图形界面的网格计算版本,用于高性能和 TB 级大小的多序列比对。

ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment.

机构信息

Department of Physiology and Integrated Biosystems, College of Medicine, Inje University, Busan 614-735, South Korea.

出版信息

BMC Bioinformatics. 2010 Sep 17;11:467. doi: 10.1186/1471-2105-11-467.

DOI:10.1186/1471-2105-11-467
PMID:20849574
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2949895/
Abstract

BACKGROUND

There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity.

RESULTS

We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets.

CONCLUSIONS

ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.

摘要

背景

人们对于组装和对齐大规模生物序列数据集的需求日益增长。目前常用的多序列比对程序仍然受到其处理大量序列能力的限制,因为系统缺乏可扩展的高性能计算(HPC)环境,其数据存储容量也非常有限。

结果

我们设计了 ClustalXeed,这是一种具有增量改进的多序列比对软件系统,与 ClustalX 和 ClustalW-MPI 软件的先前版本相比。ClustalXeed 相对于其他多序列比对软件的主要优势在于其能够对齐一大类蛋白质或核酸序列。为了解决传统的内存依赖性问题,ClustalXeed 同时使用物理随机访问内存(RAM)和分布式文件分配系统来构建距离矩阵和对排列计算。通过实现高效的负载平衡算法,称为“空闲节点搜索任务算法”(INSTA),磁盘存储系统的计算效率得到了显著提高。新的编辑选项和图形用户界面(GUI)为寻求快速、轻松对齐大型 DNA 和蛋白质序列集的用户提供了对并行计算环境的便捷访问。

结论

ClustalXeed 现在可以计算大量的生物序列数据集,这在任何其他并行或单一 MSA 程序中都是不可行的。主要的发展包括:1)通过显著提高存储处理能力,能够处理比以前的系统更大的序列比对问题。2)实现了高效的任务负载平衡算法 INSTA,它提高了具有非均匀长度输入序列的多序列比对的整体处理时间。3)支持单台 PC 和分布式集群系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/ea9b7f17c821/1471-2105-11-467-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/12386530cdb4/1471-2105-11-467-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/5eff97adb08f/1471-2105-11-467-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/e9b225700945/1471-2105-11-467-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/ea9b7f17c821/1471-2105-11-467-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/12386530cdb4/1471-2105-11-467-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/5eff97adb08f/1471-2105-11-467-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/e9b225700945/1471-2105-11-467-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdc4/2949895/ea9b7f17c821/1471-2105-11-467-4.jpg

相似文献

1
ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment.ClustalXeed:一个基于图形界面的网格计算版本,用于高性能和 TB 级大小的多序列比对。
BMC Bioinformatics. 2010 Sep 17;11:467. doi: 10.1186/1471-2105-11-467.
2
ClustalW-MPI: ClustalW analysis using distributed and parallel computing.ClustalW-MPI:使用分布式和并行计算的ClustalW分析。
Bioinformatics. 2003 Aug 12;19(12):1585-6. doi: 10.1093/bioinformatics/btg192.
3
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.面向高性能计算的生物信息学应用学习算法并行实现
BMC Bioinformatics. 2014;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-15-S5-S2. Epub 2014 May 6.
4
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper:用于在Linux集群上进行相似性搜索的一组包装应用程序。
BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.
5
Multiple sequence alignment using ClustalW and ClustalX.使用ClustalW和ClustalX进行多序列比对。
Curr Protoc Bioinformatics. 2002 Aug;Chapter 2:Unit 2.3. doi: 10.1002/0471250953.bi0203s00.
6
An improved distance matrix computation algorithm for multicore clusters.一种用于多核集群的改进型距离矩阵计算算法。
Biomed Res Int. 2014;2014:406178. doi: 10.1155/2014/406178. Epub 2014 Jun 12.
7
CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.CMSA:一种用于多个相似RNA/DNA序列比对的异构CPU/GPU计算系统。
BMC Bioinformatics. 2017 Jun 24;18(1):315. doi: 10.1186/s12859-017-1725-6.
8
The Jalview Java alignment editor.Jalview Java序列比对编辑器。
Bioinformatics. 2004 Feb 12;20(3):426-7. doi: 10.1093/bioinformatics/btg430. Epub 2004 Jan 22.
9
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
10
Grammar-based distance in progressive multiple sequence alignment.渐进多序列比对中基于语法的距离
BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306.

引用本文的文献

1
Genome-Wide Analysis of the Gene Family in Potato and Functional Verification of Under Drought Stress.马铃薯中 基因家族的全基因组分析及干旱胁迫下 的功能验证
Int J Mol Sci. 2025 Mar 6;26(5):2360. doi: 10.3390/ijms26052360.
2
GmRAV confers ecological adaptation through photoperiod control of flowering time and maturity in soybean.GmRAV 通过控制大豆的光周期来调控开花时间和成熟度,从而实现生态适应。
Plant Physiol. 2021 Sep 4;187(1):361-377. doi: 10.1093/plphys/kiab255.
3
Systematic analysis of differentially expressed genes related to drought stress in maize.

本文引用的文献

1
A comparison of MSA tools.多种序列比对工具的比较。
Bioinformation. 2008 Jul 31;2(10):452-5. doi: 10.6026/97320630002452.
2
Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本
Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.
3
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.DIALIGN P:使用并行处理器进行快速成对和多序列比对。
玉米中与干旱胁迫相关的差异表达基因的系统分析
Physiol Mol Biol Plants. 2021 Jun;27(6):1295-1309. doi: 10.1007/s12298-021-01013-2. Epub 2021 May 29.
4
Genome-Wide Mining of Wheat Gene Family Provides New Insights Into Salt Stress Responses.小麦基因家族的全基因组挖掘为盐胁迫响应提供了新见解。
Front Plant Sci. 2020 Aug 28;11:569838. doi: 10.3389/fpls.2020.569838. eCollection 2020.
5
Microscale thermophoresis as a powerful tool for screening glycosyltransferases involved in cell wall biosynthesis.微量热泳动法作为筛选参与细胞壁生物合成的糖基转移酶的强大工具。
Plant Methods. 2020 Jul 28;16:99. doi: 10.1186/s13007-020-00641-1. eCollection 2020.
6
An improved distance matrix computation algorithm for multicore clusters.一种用于多核集群的改进型距离矩阵计算算法。
Biomed Res Int. 2014;2014:406178. doi: 10.1155/2014/406178. Epub 2014 Jun 12.
7
Meta-alignment with crumble and prune: partitioning very large alignment problems for performance and parallelization.碎化与修剪的元对齐:分割超大规模对齐问题以提高性能和并行化。
BMC Bioinformatics. 2011 May 10;12:144. doi: 10.1186/1471-2105-12-144.
BMC Bioinformatics. 2004 Sep 9;5:128. doi: 10.1186/1471-2105-5-128.
4
ClustalW-MPI: ClustalW analysis using distributed and parallel computing.ClustalW-MPI:使用分布式和并行计算的ClustalW分析。
Bioinformatics. 2003 Aug 12;19(12):1585-6. doi: 10.1093/bioinformatics/btg192.
5
Multiple sequence alignment with the Clustal series of programs.使用Clustal系列程序进行多序列比对。
Nucleic Acids Res. 2003 Jul 1;31(13):3497-500. doi: 10.1093/nar/gkg500.
6
Parallelized multiple alignment.并行多序列比对
Bioinformatics. 2002 Sep;18(9):1270-1. doi: 10.1093/bioinformatics/18.9.1270.
7
On the parallelisation of bioinformatics applications.论生物信息学应用的并行化
Brief Bioinform. 2001 May;2(2):181-94. doi: 10.1093/bib/2.2.181.
8
The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.CLUSTAL_X 窗口界面:借助质量分析工具的多序列比对灵活策略。
Nucleic Acids Res. 1997 Dec 15;25(24):4876-82. doi: 10.1093/nar/25.24.4876.
9
TreeView: an application to display phylogenetic trees on personal computers.树形视图:一款在个人电脑上显示系统发育树的应用程序。
Comput Appl Biosci. 1996 Aug;12(4):357-8. doi: 10.1093/bioinformatics/12.4.357.
10
GOR method for predicting protein secondary structure from amino acid sequence.用于从氨基酸序列预测蛋白质二级结构的GOR方法。
Methods Enzymol. 1996;266:540-53. doi: 10.1016/s0076-6879(96)66034-0.