• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ClassifyTE:一种基于堆叠的转座元件层次分类预测方法。

ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements.

作者信息

Panta Manisha, Mishra Avdesh, Hoque Md Tamjidul, Atallah Joel

机构信息

Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA.

Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX 78363, USA.

出版信息

Bioinformatics. 2021 Sep 9;37(17):2529-2536. doi: 10.1093/bioinformatics/btab146.

DOI:10.1093/bioinformatics/btab146
PMID:33682878
Abstract

MOTIVATION

Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method.

RESULTS

We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68% and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs.

AVAILABILITY AND IMPLEMENTATION

The source code and data are available at https://github.com/manisa/ClassifyTE.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

转座元件(TEs)或跳跃基因是具有在宿主基因组内从一个基因组位置移动到另一个位置的内在能力的DNA序列。研究表明,功能基因内部或附近存在转座元件可能会改变其表达。转座元件还会导致突变率增加,甚至可以介导基因组中的重复以及大的插入和缺失,促进大规模的基因重排。对已识别的跳跃基因进行正确分类对于分析其遗传和进化效应至关重要。需要一种能够更准确地解释转座元件在种系和体细胞进化中作用的有效分类器。在本研究中,我们检验了多种机器学习(ML)技术的性能,并提出了一种强大的方法ClassifyTE,用于使用基于堆叠的ML方法对转座元件进行高精度的层次分类。

结果

我们提出了一种基于堆叠的方法用于转座元件的层次分类。当在三个不同的基准数据集上进行训练时,与几种最先进的方法相比,我们提出的系统实现了4%、10.68%和10.13%的平均百分比提升(使用hF度量)。我们基于所提出的方法开发了一个端到端的自动层次分类工具ClassifyTE,用于将转座元件分类到超家族水平。我们进一步在通过基于同源性的分类方法生成的新转座元件库上评估了我们的方法,发现在较高分类水平上具有相对较高的一致性。因此,ClassifyTE为更准确地分析转座元件的作用铺平了道路。

可用性和实现

源代码和数据可在https://github.com/manisa/ClassifyTE获取。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements.ClassifyTE:一种基于堆叠的转座元件层次分类预测方法。
Bioinformatics. 2021 Sep 9;37(17):2529-2536. doi: 10.1093/bioinformatics/btab146.
2
DeepTE: a computational method for de novo classification of transposons with convolutional neural network.DeepTE:一种基于卷积神经网络的转座子从头分类计算方法。
Bioinformatics. 2020 Aug 1;36(15):4269-4275. doi: 10.1093/bioinformatics/btaa519.
3
TEcandidates: prediction of genomic origin of expressed transposable elements using RNA-seq data.TE 候选物:使用 RNA-seq 数据预测表达转座元件的基因组起源。
Bioinformatics. 2018 Nov 15;34(22):3915-3916. doi: 10.1093/bioinformatics/bty423.
4
T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data.T-lex3:一种准确的工具,可使用最新的短读长全基因组测序数据对转座子进行基因分型和估计种群频率。
Bioinformatics. 2020 Feb 15;36(4):1191-1197. doi: 10.1093/bioinformatics/btz727.
5
ExplorATE: a new pipeline to explore active transposable elements from RNA-seq data.ExplorATE:一种从 RNA-seq 数据中探索活跃转座元件的新管道。
Bioinformatics. 2022 Jun 27;38(13):3361-3366. doi: 10.1093/bioinformatics/btac354.
6
Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes.真核生物基因组中转座元件的生物信息学和基因组分析。
Chromosome Res. 2011 Aug;19(6):787-808. doi: 10.1007/s10577-011-9230-7.
7
Transposable elements: powerful contributors to angiosperm evolution and diversity.转座元件:被子植物进化和多样性的有力贡献者。
Genome Biol Evol. 2013;5(10):1886-901. doi: 10.1093/gbe/evt141.
8
Genomic re-assessment of the transposable element landscape of the potato genome.马铃薯基因组转座元件景观的基因组再评估。
Plant Cell Rep. 2020 Sep;39(9):1161-1174. doi: 10.1007/s00299-020-02554-8. Epub 2020 May 20.
9
TERL: classification of transposable elements by convolutional neural networks.TERL:基于卷积神经网络的转座元件分类。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa185.
10
An automated homology-based approach for identifying transposable elements.基于同源性的自动方法用于鉴定转座元件。
BMC Bioinformatics. 2011 May 3;12:130. doi: 10.1186/1471-2105-12-130.

引用本文的文献

1
Near telomere-to-telomere genome assemblies of Silkie Gallus gallus and Mallard Anas platyrhynchos restored the structure of chromosomes and "missing" genes in birds.丝羽乌骨鸡(Gallus gallus)和绿头鸭(Anas platyrhynchos)近乎端粒到端粒的基因组组装恢复了鸟类染色体的结构和“缺失”基因。
J Anim Sci Biotechnol. 2025 Jan 20;16(1):9. doi: 10.1186/s40104-024-01141-1.
2
Teaching transposon classification as a means to crowd source the curation of repeat annotation - a tardigrade perspective.将转座子分类作为众包重复序列注释整理的一种手段进行教学——以缓步动物为例。
Mob DNA. 2024 May 6;15(1):10. doi: 10.1186/s13100-024-00319-8.
3
From tradition to innovation: conventional and deep learning frameworks in genome annotation.
从传统到创新:基因组注释中的常规和深度学习框架。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae138.
4
The good, the bad and the ugly of transposable elements annotation tools.转座元件注释工具的优劣与问题
Genet Mol Biol. 2024 Feb 19;46(3 Suppl 1):e20230138. doi: 10.1590/1678-4685-GMB-2023-0138. eCollection 2024.
5
Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks.基因组对象检测:一种使用卷积神经网络改进的转座元件检测和分类方法。
PLoS One. 2023 Sep 21;18(9):e0291925. doi: 10.1371/journal.pone.0291925. eCollection 2023.
6
ATTIC is an integrated approach for predicting A-to-I RNA editing sites in three species.ATTIC 是一种综合方法,可用于预测三个物种中的 A 到 I RNA 编辑位点。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad170.
7
AGRN: accurate gene regulatory network inference using ensemble machine learning methods.AGRN:使用集成机器学习方法进行准确的基因调控网络推断
Bioinform Adv. 2023 Apr 5;3(1):vbad032. doi: 10.1093/bioadv/vbad032. eCollection 2023.
8
Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes.Inpactor2:一款基于深度学习的软件,用于鉴定和分类植物基因组中的 LTR 反转录转座子。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac511.
9
Evaluating hierarchical machine learning approaches to classify biological databases.评估分层机器学习方法对生物数据库进行分类。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac216.
10
Dynamics and Impacts of Transposable Element Proliferation in the Drosophila nasuta Species Group Radiation.转座元件在果蝇 nasuta 种团辐射中的增殖动态及其影响。
Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac080.