Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Bioinformatics. 2019 Nov 1;35(21):4408-4410. doi: 10.1093/bioinformatics/btz264.
Human alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes.
人类的α卫星和卫星 2/3 贡献了人类基因组的百分之几。然而,用传统算法识别这些序列计算量很大。在这里,我们开发了 dna-brnn,这是一种递归神经网络,可以学习着丝粒重复序列的两类。它与 RepeatMasker 具有很高的相似度,速度也快了好几倍。Dna-brnn 探索了深度学习的新应用,可能会加速对这两类重复序列进化的研究。