Synthetic Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan.
Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0035, Japan.
Nucleic Acids Res. 2019 Jan 25;47(2):e8. doi: 10.1093/nar/gky890.
Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.
周期性重复的 DNA 和蛋白质元件参与各种重要的生物学事件,包括基因组进化、基因调控、蛋白质复合物形成和免疫。值得注意的是,目前使用的基因组编辑工具,如 ZFNs、TALENs 和 CRISPRs,也都与天然生物的周期性重复生物分子有关。尽管周期性重复序列具有重要的生物学意义,并且预计可以从这些周期性重复中发现新的基因组编辑模块,但目前还没有开发出能够以高通量和无监督的方式在大型基因组资源中全局检测此类结构化元件的软件。我们开发了一种新的软件 SPADE(Search for Patterned DNA Elements),它基于 k-mer 周期性评估,从大规模基因组数据集全面探索周期性 DNA 和蛋白质重复。通过一个简单的约束条件,序列周期性,SPADE 捕获了报道的与基因组编辑相关的序列和其他涉及重复结构域的蛋白质家族,如 tetratricopeptide、ankyrin 和 WD40 重复,其性能优于为有限数量的重复生物分子序列设计的其他软件,这表明该软件具有很高的潜力,可以帮助发现新的生物学事件和新的基因组编辑模块。