Suppr超能文献

基于词邻计数的哺乳动物增强子序列相似性估计。

Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

机构信息

Department for Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.

出版信息

Bioinformatics. 2012 Mar 1;28(5):656-63. doi: 10.1093/bioinformatics/bts028. Epub 2012 Jan 12.

Abstract

MOTIVATION

The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets.

RESULTS

We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2.

CONCLUSION

N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences.

AVAILABILITY

The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

细胞和组织的身份在很大程度上受转录调控的控制。主要部分是通过转录因子在调节序列(如增强子)上的组合结合来完成的。尽管转录因子的结合是序列特异性的,但估计两个功能相似的增强子的序列相似性是非常困难的。然而,对调节序列的相似性度量对于检测和理解两个增强子之间的功能相似性至关重要,并将促进大规模分析,如聚类、预测和分类全基因组数据集。

结果

我们提出了标准化的无比对序列相似性度量 N2,这是一个灵活的框架,定义为单词邻域。我们探索了在邻域中添加反向互补词以及包含错配的词的有用性。在模拟的增强子序列和小鼠发育中的功能增强子上,N2 被证明优于以前的无比对度量。N2 具有灵活性、比竞争方法更快、对单个序列噪声和重复序列的出现更不敏感。对小鼠增强子的实验表明,使用 N2 可以通过两两比较将在不同组织中活跃的增强子分开。

结论

N2 代表了对以前的无比对相似性度量的改进,而不会牺牲速度,这使其成为调节序列大规模序列比较的良好候选者。

可用性

该软件是开源 C++库 SeqAn(www.seqan.de)的一部分,编译版本可在 http://www.seqan.de/projects/alf.html 下载。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51f9/3289921/31d9a6a3073c/bts028f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验