Suppr超能文献

发散集,一种从大型序列集合中挑选非冗余序列的工具。

DivergentSet, a tool for picking non-redundant sequences from large sequence collections.

作者信息

Widmann Jeremy, Hamady Micah, Knight Rob

机构信息

Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA.

出版信息

Mol Cell Proteomics. 2006 Aug;5(8):1520-32. doi: 10.1074/mcp.T600022-MCP200. Epub 2006 Jun 11.

Abstract

DivergentSet addresses the important but so far neglected bioinformatics task of choosing a representative set of sequences from a larger collection. We found that using a phylogenetic tree to guide the construction of divergent sets of sequences can be up to 2 orders of magnitude faster than the naive method of using a full distance matrix. By providing a user-friendly interface (available online) that integrates the tasks of finding additional sequences, building and refining the divergent set, producing random divergent sets from the same sequences, and exporting identifiers, this software facilitates a wide range of bioinformatics analyses including finding significant motifs and covariations. As an example application of DivergentSet, we demonstrate that the motifs identified by the motif-finding package MEME (Motif Elicitation by Maximum Entropy) are highly unstable with respect to the specific choice of sequences. This instability suggests that the types of sensitivity analysis enabled by DivergentSet may be widely useful for identifying the motifs of biological significance.

摘要

DivergentSet解决了从更大的序列集合中选择一组代表性序列这一重要但迄今为止被忽视的生物信息学任务。我们发现,使用系统发育树来指导构建不同的序列集比使用完整距离矩阵的朴素方法快达2个数量级。通过提供一个用户友好的界面(在线可用),该界面集成了查找额外序列、构建和完善不同序列集、从相同序列生成随机不同序列集以及导出标识符等任务,此软件促进了广泛的生物信息学分析,包括发现显著基序和共变关系。作为DivergentSet的一个示例应用,我们证明了由基序查找软件包MEME(通过最大熵进行基序引出)识别出的基序对于序列的特定选择非常不稳定。这种不稳定性表明,DivergentSet所实现的敏感性分析类型可能在识别具有生物学意义的基序方面具有广泛用途。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验