• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用CLUSTAL进行多序列比对。

Using CLUSTAL for multiple sequence alignments.

作者信息

Higgins D G, Thompson J D, Gibson T J

机构信息

European Molecular Biology Laboratory Outstation-European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom.

出版信息

Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.

DOI:10.1016/s0076-6879(96)66024-8
PMID:8743695
Abstract

We have tested CLUSTAL W in a wide variety of situations, and it is capable of handling some very difficult protein alignment problems. If the data set consists of enough closely related sequences so that the first alignments are accurate, then CLUSTAL W will usually find an alignment that is very close to ideal. Problems can still occur if the data set includes sequences of greatly different lengths or if some sequences include long regions that are impossible to align with the rest of the data set. Trying to balance the need for long insertions and deletions in some alignments with the need to avoid them in others is still a problem. The default values for our parameters were tested empirically using test cases of sets of globular proteins where some information as to the correct alignment was available. The parameter values may not be very appropriate with nonglobular proteins. We have argued that using one weight matrix and two gap penalties is too simplistic to be of general use in the most difficult cases. We have replaced these parameters with a large number of new parameters designed primarily to help encourage gaps in loop regions. Although these new parameters are largely heuristic in nature, they perform surprisingly well and are simple to implement. The underlying speed of the progressive alignment approach is not adversely affected. The disadvantage is that the parameter space is now huge; the number of possible combinations of parameters is more than can easily be examined by hand. We justify this by asking the user to treat CLUSTAL W as a data exploration tool rather than as a definitive analysis method. It is not sensible to automatically derive multiple alignments and to trust particular algorithms as being capable of always getting the correct answer. One must examine the alignments closely, especially in conjunction with the underlying phylogenetic tree (or estimate of it) and try varying some of the parameters. Outliers (sequences that have no close relatives) should be aligned carefully, as should fragments of sequences. The program will automatically delay the alignment of any sequences that are less than 40% identical to any others until all other sequences are aligned, but this can be set from a menu by the user. It may be useful to build up an alignment of closely related sequences first and to then add in the more distant relatives one at a time or in batches, using the profile alignments and weighting scheme described earlier and perhaps using a variety of parameter settings. We give one example using SH2 domains. SH2 domains are widespread in eukaryotic signalling proteins where they function in the recognition of phosphotyrosine-containing peptides. In the chapter by Bork and Gibson ([11], this volume), Blast and pattern/profile searches were used to extract the set of known SH2 domains and to search for new members. (Profiles used in database searches are conceptually very similar to the profiles used in CLUSTAL W: see the chapters [11] and [13] for profile search methods.) The profile searches detected SH2 domains in the JAK family of protein tyrosine kinases, which were thought not to contain SH2 domains. Although the JAK family SH2 domains are rather divergent, they have the necessary core structural residues as well as the critical positively charged residue that binds phosphotyrosine, leaving no doubt that they are bona fide SH2 domains. The five new JAK family SH2 domains were added sequentially to the existing alignment of 65 SH2 domains using the CLUSTAL W profile alignment option. Figure 6 shows part of the resulting alignment. Despite their divergent sequences, the new SH2 domains have been aligned nearly perfectly with the old set. No insertions were placed in the original SH2 domains. In this example, the profile alignment procedure has produced better results than a one-step full alignment of all 70 SH2 domains, and in considerably less time. (ABSTRACT TRUNCATED)

摘要

我们在各种情况下对CLUSTAL W进行了测试,它能够处理一些非常困难的蛋白质比对问题。如果数据集包含足够多密切相关的序列,使得初始比对是准确的,那么CLUSTAL W通常会找到一个非常接近理想的比对。如果数据集包含长度差异很大的序列,或者某些序列包含与数据集中其他序列无法比对的长区域,问题仍然可能出现。在一些比对中平衡长插入和缺失的需求与在其他比对中避免它们的需求仍然是个问题。我们通过使用一组球状蛋白质的测试案例,凭经验测试了参数的默认值,这些测试案例有一些关于正确比对的信息。这些参数值对于非球状蛋白质可能不太合适。我们认为,在最困难的情况下,使用一个权重矩阵和两个空位罚分过于简单,不具有普遍适用性。我们用大量新参数取代了这些参数,这些新参数主要是为了帮助鼓励在环区出现空位。尽管这些新参数在很大程度上是启发式的,但它们的表现出人意料地好,并且易于实现。渐进比对方法的基本速度没有受到不利影响。缺点是参数空间现在非常大;参数的可能组合数量超过了手工轻松检查的范围。我们通过要求用户将CLUSTAL W视为一种数据探索工具而非确定性分析方法来证明这一点。自动推导多个比对并相信特定算法总能得到正确答案是不明智的。必须仔细检查比对结果,特别是结合基础系统发育树(或其估计),并尝试改变一些参数。异常值(没有近亲的序列)以及序列片段都应该仔细比对。程序会自动延迟比对任何与其他序列相似度低于40%的序列,直到所有其他序列都比对完成,但用户可以通过菜单设置这一参数。先构建密切相关序列的比对,然后一次一个或一批地添加较远的亲属序列,使用前面描述的轮廓比对和加权方案,也许还使用各种参数设置,可能会很有用。我们给出一个使用SH2结构域的例子。SH2结构域广泛存在于真核生物信号蛋白中,它们在识别含磷酸酪氨酸的肽中发挥作用。在Bork和Gibson([11],本卷)的章节中,使用Blast和模式/轮廓搜索来提取已知的SH2结构域集并寻找新成员。(数据库搜索中使用的轮廓在概念上与CLUSTAL W中使用的轮廓非常相似:有关轮廓搜索方法,请参见章节[11]和[13]。)轮廓搜索在蛋白酪氨酸激酶的JAK家族中检测到了SH2结构域,而之前认为该家族不包含SH2结构域。尽管JAK家族的SH2结构域差异较大,但它们具有必要的核心结构残基以及结合磷酸酪氨酸的关键带正电荷残基,这毫无疑问地表明它们是真正的SH2结构域。使用CLUSTAL W轮廓比对选项将五个新的JAK家族SH2结构域依次添加到现有的65个SH2结构域的比对中。图6显示了部分比对结果。尽管它们的序列不同,但新的SH2结构域与旧的结构域集几乎完美比对。原始的SH2结构域中没有插入。在这个例子中,轮廓比对程序比一次性对所有70个SH2结构域进行完全比对产生了更好的结果,而且用时少得多。(摘要截断)

相似文献

1
Using CLUSTAL for multiple sequence alignments.使用CLUSTAL进行多序列比对。
Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.
2
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.CLUSTAL W:通过序列加权、位置特异性空位罚分和权重矩阵选择提高渐进多序列比对的灵敏度。
Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673.
3
Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases.结合多个结构和序列比对以改进序列检测和比对:应用于Janus激酶的SH2结构域
Proc Natl Acad Sci U S A. 2001 Dec 18;98(26):14796-801. doi: 10.1073/pnas.011577898.
4
The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence.几个多序列比对程序针对一个rRNA序列的二级结构特征的性能。
Mol Biol Evol. 2000 Apr;17(4):530-9. doi: 10.1093/oxfordjournals.molbev.a026333.
5
Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.真核生物中SH2结构域的分类与谱系追踪
Methods Mol Biol. 2017;1555:59-75. doi: 10.1007/978-1-4939-6762-9_4.
6
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
7
Large-scale comparison of protein sequence alignment algorithms with structure alignments.蛋白质序列比对算法与结构比对的大规模比较。
Proteins. 2000 Jul 1;40(1):6-22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.
8
Multiple DNA and protein sequence alignment based on segment-to-segment comparison.基于片段对片段比较的多DNA和蛋白质序列比对。
Proc Natl Acad Sci U S A. 1996 Oct 29;93(22):12098-103. doi: 10.1073/pnas.93.22.12098.
9
A non-local gap-penalty for profile alignment.一种用于轮廓比对的非局部空位罚分。
Bull Math Biol. 1996 Jan;58(1):1-18. doi: 10.1007/BF02458279.
10
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

引用本文的文献

1
Atomistic-Level Insights into the Role of Mutations in the Engineering of PET Hydrolases: A Systematic Review.原子水平洞察PET水解酶工程中突变的作用:系统综述
Int J Mol Sci. 2025 Aug 8;26(16):7682. doi: 10.3390/ijms26167682.
2
Endonuclease Genes in Rice Are Involved in Phosphate Source Recycling by DNA Decay From Phosphate Deprivation.水稻中的核酸内切酶基因通过磷缺乏导致的DNA降解参与磷源循环利用。
Physiol Plant. 2025 Jul-Aug;177(4):e70452. doi: 10.1111/ppl.70452.
3
The ACE2 Receptor from Common Vampire Bat () and Pallid Bat () Support Attachment and Limited Infection of SARS-CoV-2 Viruses in Cell Culture.
普通吸血蝠()和苍白洞蝠()的ACE2受体在细胞培养中支持SARS-CoV-2病毒的附着和有限感染。
Viruses. 2025 Mar 31;17(4):507. doi: 10.3390/v17040507.
4
Revised diagnoses of the gudgeons and (Actinopterygii, Gobiiformes, Eleotridae).虾虎鱼(辐鳍鱼纲,虾虎鱼目,塘鳢科)的修订诊断
Zookeys. 2025 Mar 17;1232:173-186. doi: 10.3897/zookeys.1232.141880. eCollection 2025.
5
An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability.一种用于克氏锥虫MASP序列注释和分类的算法:旨在更好地理解寄生虫的遗传变异性。
BMC Genomics. 2025 Feb 24;26(1):194. doi: 10.1186/s12864-025-11384-5.
6
A claudin5-binding peptide enhances the permeability of the blood-brain barrier in vitro.一种紧密连接蛋白5结合肽在体外增强血脑屏障的通透性。
Sci Adv. 2025 Jan 10;11(2):eadq2616. doi: 10.1126/sciadv.adq2616.
7
Genetic variation, structural analysis, and virulence implications of BimA and BimC in clinical isolates of Burkholderia pseudomallei in Thailand.泰国伯克霍尔德氏菌临床分离株中 BimA 和 BimC 的遗传变异、结构分析及其毒力影响。
Sci Rep. 2024 Oct 23;14(1):24966. doi: 10.1038/s41598-024-74922-3.
8
Phylogenomic reconstruction of spp. captured directly from clinical samples reveals extensive genetic diversity.直接从临床样本中获取的[物种名称]的系统基因组重建揭示了广泛的遗传多样性。 (注:原文中“spp.”指代不明,这里假设为某一物种名称进行翻译)
bioRxiv. 2024 Apr 20:2024.04.17.589752. doi: 10.1101/2024.04.17.589752.
9
The evolutionary loss of the Eh1 motif in FoxE1 in the lineage of placental mammals.FoxE1 中 Eh1 基序在胎盘哺乳动物谱系中的进化丢失。
PLoS One. 2023 Dec 27;18(12):e0296176. doi: 10.1371/journal.pone.0296176. eCollection 2023.
10
Characterization and description of Gabonibacter chumensis sp. nov., isolated from feces of a patient with non-small cell lung cancer treated with immunotherapy.戈巴内杆菌属的分类学描述和鉴定:一种从接受免疫治疗的非小细胞肺癌患者粪便中分离到的新型细菌。
Arch Microbiol. 2023 Sep 24;205(10):338. doi: 10.1007/s00203-023-03671-0.