Suppr超能文献

迈向用于多序列比对的可靠目标函数。

Towards a reliable objective function for multiple sequence alignments.

作者信息

Thompson J D, Plewniak F, Ripp R, Thierry J C, Poch O

机构信息

Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, (CNRS/INSERM/ULP), Illkirch Cedex, 67404, France.

出版信息

J Mol Biol. 2001 Dec 7;314(4):937-51. doi: 10.1006/jmbi.2001.5187.

Abstract

Multiple sequence alignment is a fundamental tool in a number of different domains in modern molecular biology, including functional and evolutionary studies of a protein family. Multiple alignments also play an essential role in the new integrated systems for genome annotation and analysis. Thus, the development of new multiple alignment scores and statistics is essential, in the spirit of the work dedicated to the evaluation of pairwise sequence alignments for database searching techniques. We present here norMD, a new objective scoring function for multiple sequence alignments. NorMD combines the advantages of the column-scoring techniques with the sensitivity of methods incorporating residue similarity scores. In addition, norMD incorporates ab initio sequence information, such as the number, length and similarity of the sequences to be aligned. The sensitivity and reliability of the norMD objective function is demonstrated using structural alignments in the SCOP and BAliBASE databases. The norMD scores are then applied to the multiple alignments of the complete sequences (MACS) detected by BlastP with E-value<10, for a set of 734 hypothetical proteins encoded by the Vibrio cholerae genome. Unrelated or badly aligned sequences were automatically removed from the MACS, leaving a high-quality multiple alignment which could be reliably exploited in a subsequent functional and/or structural annotation process. After removal of unreliable sequences, 176 (24 %) of the alignments contained at least one sequence with a functional annotation. 103 of these new matches were supported by significant hits to the Interpro domain and motif database.

摘要

多序列比对是现代分子生物学许多不同领域中的一项基本工具,包括蛋白质家族的功能和进化研究。多序列比对在基因组注释和分析的新集成系统中也起着至关重要的作用。因此,本着致力于评估数据库搜索技术中成对序列比对的工作精神,开发新的多序列比对评分和统计方法至关重要。我们在此介绍norMD,一种用于多序列比对的新的客观评分函数。norMD结合了列评分技术的优点以及纳入残基相似性评分的方法的敏感性。此外,norMD纳入了从头算序列信息,例如要比对的序列的数量、长度和相似性。使用SCOP和BAliBASE数据库中的结构比对证明了norMD目标函数的敏感性和可靠性。然后将norMD评分应用于由BlastP检测到的E值<10的完整序列(MACS)的多序列比对,这些序列来自霍乱弧菌基因组编码的一组734个假设蛋白质。不相关或比对不佳的序列会自动从MACS中去除,留下高质量的多序列比对,可在后续的功能和/或结构注释过程中可靠地利用。去除不可靠序列后,176个(24%)比对中至少包含一个具有功能注释的序列。其中103个新匹配得到了Interpro结构域和基序数据库的显著匹配支持。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验