Suppr超能文献

梗犬:一种深度学习重复分类器。

Terrier: a deep learning repeat classifier.

作者信息

Turnbull Robert, Young Neil D, Tescari Edoardo, Skerratt Lee F, Kosch Tiffany A

机构信息

Melbourne Data Analytics Platform, University of Melbourne, 700 Swanston Street, Carlton, 3053, VIC, Australia.

Faculty of Science, University of Melbourne, Grattan Street, Parkville, 3010, VIC, Australia.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf442.

Abstract

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families-four times more than Dfam-Terrier maps 97.1% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm, and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

摘要

重复DNA序列是基因组结构和进化过程的基础,但要准确分类仍具有挑战性。Terrier是一种深度学习模型,旨在通过使用在RepeatMasker模式下训练的公开可用的、经过整理的重复序列库对重复DNA序列进行分类来克服这些挑战。重复数据库中分类群的代表性不足常常限制了当前重复注释方法的分类准确性和可重复性,从而限制了我们对重复序列进化和功能的理解。Terrier通过利用深度学习提高准确性来克服这些挑战。它在Repbase上进行训练,Repbase包含超过10万个重复家族,是Dfam的四倍。Terrier将97.1%的Repbase序列映射到RepeatMasker类别,提供了最全面的分类系统。在模式生物(水稻、果蝇、人类和小鼠)中与DeepTE、TERL和TEclass2进行基准测试时,Terrier在对更广泛的序列进行分类时取得了更高的准确性。在非模式两栖动物、扁虫和北极磷虾基因组中的进一步验证突出了其在提高非模式物种分类方面的有效性,有助于对重复序列驱动的进化、基因组不稳定性和表型变异的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/29e0797ea5c4/bbaf442ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验