Suppr超能文献

大型多样序列家族的归约、比对与可视化

Reduction, alignment and visualisation of large diverse sequence families.

作者信息

Taylor William R

机构信息

Francsis Crick Institute, 1 Midland Rd., London, NW1 1AT, UK.

出版信息

BMC Bioinformatics. 2016 Aug 2;17(1):300. doi: 10.1186/s12859-016-1059-9.

Abstract

BACKGROUND

Current volumes of sequence data can lead to large numbers of hits identified on a search, typically in the range of 10s to 100s of thousands. It is often quite difficult to tell from these raw results whether the search has been a success or has picked-up sequences with little or no relationship to the query. The best approach to this problem is to cluster and align the resulting families, however, existing methods concentrate on fast clustering and either do not align the sequences or only perform a limited alignment.

RESULTS

A method (MULSEL) is presented that combines fast peptide-based pre-sorting with a following cascade of mini-alignments, each of which are generated with a robust profile/profile method. From these mini-alignments, a representative sequence is selected, based on a variety of intrinsic and user-specified criteria that are combined to produce the sequence collection for the next cycle of alignment. For moderate sized sequence collections (10s of thousands) the method executes on a laptop computer within seconds or minutes.

CONCLUSIONS

MULSEL bridges a gap between fast clustering methods and slower multiple sequence alignment methods and provides a seamless transition from one to the other. Furthermore, it presents the resulting reduced family in a graphical manner that makes it clear if family members have been misaligned or if there are sequences present that appear inconsistent.

摘要

背景

当前的序列数据量可能导致在搜索时识别出大量匹配结果,通常在数万到数十万的范围内。从这些原始结果中往往很难判断搜索是否成功,或者是否找到了与查询几乎没有关系的序列。解决这个问题的最佳方法是对结果家族进行聚类和比对,然而,现有方法专注于快速聚类,要么不对序列进行比对,要么只进行有限的比对。

结果

提出了一种方法(MULSEL),该方法将基于肽的快速预排序与随后的一系列小型比对相结合,每个小型比对都使用强大的profile/profile方法生成。根据各种内在和用户指定的标准从这些小型比对中选择一个代表性序列,这些标准相结合以产生用于下一轮比对的序列集合。对于中等规模的序列集合(数万条),该方法在笔记本电脑上只需几秒或几分钟即可执行。

结论

MULSEL弥合了快速聚类方法和较慢的多序列比对方法之间的差距,并提供了从一种方法到另一种方法的无缝过渡。此外,它以图形方式呈现所得的简化家族,从而清楚地表明家族成员是否比对错误,或者是否存在看起来不一致的序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe09/4971687/418dbe0e8648/12859_2016_1059_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验