Suppr超能文献

使用最短唯一子串进行无需比对的基因组比较。

Genome comparison without alignment using shortest unique substrings.

作者信息

Haubold Bernhard, Pierstorff Nora, Möller Friedrich, Wiehe Thomas

机构信息

Department of Biotechnology and Bioinformatics, University of Applied Sciences, Weihenstephan, Germany.

出版信息

BMC Bioinformatics. 2005 May 23;6:123. doi: 10.1186/1471-2105-6-123.

Abstract

BACKGROUND

Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees.

RESULTS

We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes.

CONCLUSION

We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING) and shulen are written in C and available at http://adenine.biz.fh-weihenstephan.de/shustring/.

摘要

背景

通过比对进行序列比较是分子生物学的一项基本工具。在本文中,我们展示了如何在不进行比对步骤的情况下高效完成许多序列比较任务,包括检测独特的基因组区域。我们用于核苷酸序列比较的程序基于最短独特子串。这些子串在分析的序列或序列集中仅出现一次,并且在不失去独特性的情况下不能进一步缩短长度。可以使用广义后缀树检测此类子串。

结果

我们发现,秀丽隐杆线虫、人类和小鼠常染色体中的最短独特子串长度不超过11个碱基对。在小鼠和人类中,这些独特子串在已知基因的上游区域显著聚集。此外,在人类或小鼠基因组中偶然发现此类短独特子串的概率极小。我们根据查询序列的GC含量推导出最短独特子串零分布的解析表达式。此外,我们将我们的方法应用于快速检测金黄色葡萄球菌菌株MSSA476与其他四个葡萄球菌基因组相比的独特基因组区域。

结论

我们结合了一种在DNA序列中快速搜索最短独特子串的方法及其零分布的推导。我们表明,使用这种方法可以有效地检测基因组任意样本中的独特区域。相应的程序shustring(最短独特子串)和shulen用C语言编写,可在http://adenine.biz.fh-weihenstephan.de/shustring/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/337f/1166540/5cedb73c4b92/1471-2105-6-123-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验