• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

梗犬:一种深度学习重复分类器。

Terrier: a deep learning repeat classifier.

作者信息

Turnbull Robert, Young Neil D, Tescari Edoardo, Skerratt Lee F, Kosch Tiffany A

机构信息

Melbourne Data Analytics Platform, University of Melbourne, 700 Swanston Street, Carlton, 3053, VIC, Australia.

Faculty of Science, University of Melbourne, Grattan Street, Parkville, 3010, VIC, Australia.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf442.

DOI:10.1093/bib/bbaf442
PMID:40862518
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12381760/
Abstract

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families-four times more than Dfam-Terrier maps 97.1% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm, and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

摘要

重复DNA序列是基因组结构和进化过程的基础,但要准确分类仍具有挑战性。Terrier是一种深度学习模型,旨在通过使用在RepeatMasker模式下训练的公开可用的、经过整理的重复序列库对重复DNA序列进行分类来克服这些挑战。重复数据库中分类群的代表性不足常常限制了当前重复注释方法的分类准确性和可重复性,从而限制了我们对重复序列进化和功能的理解。Terrier通过利用深度学习提高准确性来克服这些挑战。它在Repbase上进行训练,Repbase包含超过10万个重复家族,是Dfam的四倍。Terrier将97.1%的Repbase序列映射到RepeatMasker类别,提供了最全面的分类系统。在模式生物(水稻、果蝇、人类和小鼠)中与DeepTE、TERL和TEclass2进行基准测试时,Terrier在对更广泛的序列进行分类时取得了更高的准确性。在非模式两栖动物、扁虫和北极磷虾基因组中的进一步验证突出了其在提高非模式物种分类方面的有效性,有助于对重复序列驱动的进化、基因组不稳定性和表型变异的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/40226b81cc3e/bbaf442f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/29e0797ea5c4/bbaf442ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/61e0777946e3/bbaf442f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/dfb317ff8835/bbaf442f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/7f30c8e34404/bbaf442f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/9fcadc77dea8/bbaf442f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/51ade226444f/bbaf442f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/8309e800bac8/bbaf442f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/20add3fb6842/bbaf442f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/e3dacc27dee3/bbaf442f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/40226b81cc3e/bbaf442f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/29e0797ea5c4/bbaf442ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/61e0777946e3/bbaf442f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/dfb317ff8835/bbaf442f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/7f30c8e34404/bbaf442f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/9fcadc77dea8/bbaf442f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/51ade226444f/bbaf442f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/8309e800bac8/bbaf442f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/20add3fb6842/bbaf442f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/e3dacc27dee3/bbaf442f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17af/12381760/40226b81cc3e/bbaf442f9.jpg

相似文献

1
Terrier: a deep learning repeat classifier.梗犬:一种深度学习重复分类器。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf442.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Integrating multi-source data for skin burn classification using deep learning.利用深度学习整合多源数据进行皮肤烧伤分类
Comput Biol Med. 2025 Sep;195:110556. doi: 10.1016/j.compbiomed.2025.110556. Epub 2025 Jun 24.
4
Elbow Fractures Overview肘部骨折概述
5
A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。
Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.
6
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
7
CXR-MultiTaskNet a unified deep learning framework for joint disease localization and classification in chest radiographs.CXR-MultiTaskNet:一种用于胸部X光片中疾病联合定位与分类的统一深度学习框架。
Sci Rep. 2025 Aug 31;15(1):32022. doi: 10.1038/s41598-025-16669-z.
8
The impact of bioinformatic choices on variant identification accuracy.生物信息学选择对变异识别准确性的影响。
Microbiol Spectr. 2025 Aug 15:e0123225. doi: 10.1128/spectrum.01232-25.
9
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
10
Idiopathic (Genetic) Generalized Epilepsy特发性(遗传性)全身性癫痫

本文引用的文献

1
Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins.北极磷虾的生态基因组学揭示了跨大洋盆地局部适应的基因座。
Nat Commun. 2024 Aug 1;15(1):6297. doi: 10.1038/s41467-024-50239-7.
2
Comparative genomics reveals insights into anuran genome size evolution.比较基因组学揭示了对无尾目基因组大小进化的深入了解。
BMC Genomics. 2023 Jul 6;24(1):379. doi: 10.1186/s12864-023-09499-8.
3
Insights into mammalian TE diversity through the curation of 248 genome assemblies.通过对 248 个基因组组装的整理,深入了解哺乳动物 TE 多样性。
Science. 2023 Apr 28;380(6643):eabn1430. doi: 10.1126/science.abn1430.
4
msRepDB: a comprehensive repetitive sequence database of over 80 000 species.msRepDB:一个涵盖超过 80000 个物种的综合重复序列数据库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D236-D245. doi: 10.1093/nar/gkab1089.
5
Transposable elements shape the evolution of mammalian development.转座元件塑造了哺乳动物发育的进化。
Nat Rev Genet. 2021 Nov;22(11):691-711. doi: 10.1038/s41576-021-00385-1. Epub 2021 Aug 5.
6
TERL: classification of transposable elements by convolutional neural networks.TERL:基于卷积神经网络的转座元件分类。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa185.
7
Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。
Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.
8
The Dfam community resource of transposable element families, sequence models, and genome annotations.转座元件家族、序列模型和基因组注释的Dfam社区资源。
Mob DNA. 2021 Jan 12;12(1):2. doi: 10.1186/s13100-020-00230-y.
9
A Field Guide to Eukaryotic Transposable Elements.真核转座元件野外手册。
Annu Rev Genet. 2020 Nov 23;54:539-561. doi: 10.1146/annurev-genet-040620-022145. Epub 2020 Sep 21.
10
Being Merle: The Molecular Genetic Background of the Canine Merle Mutation.犬只“陨石”色的分子遗传学背景
Genes (Basel). 2020 Jun 17;11(6):660. doi: 10.3390/genes11060660.