• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于-mer的机器学习方法对植物基因组中的LTR反转录转座子进行分类。

-mer-based machine learning method to classify LTR-retrotransposons in plant genomes.

作者信息

Orozco-Arias Simon, Candamil-Cortés Mariana S, Jaimes Paula A, Piña Johan S, Tabares-Soto Reinel, Guyot Romain, Isaza Gustavo

机构信息

Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.

Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia.

出版信息

PeerJ. 2021 May 19;9:e11456. doi: 10.7717/peerj.11456. eCollection 2021.

DOI:10.7717/peerj.11456
PMID:34055489
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8140598/
Abstract

Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on -mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences.

摘要

每天都有更多的植物基因组出现在公共数据库中,并且更多大规模测序项目(即旨在对数千个个体进行测序的项目)被制定和发布。然而,目前还没有足够的自动化工具来分析如此大量的基因组信息。长末端重复序列反转录转座子(LTR反转录转座子)是植物基因组中最常见的重复序列;然而,它们的检测和分类通常使用半自动且耗时的程序来进行。尽管有几种采用不同方法来检测和分类它们的生物信息学工具,但这些工具都无法单独获得准确的结果。在这里,我们使用基于k-mer计数的机器学习算法,将LTR反转录转座子与其他基因组序列区分开来,并以95%的F1分数将其分类到不同的谱系/家族中,这有助于开发一种用于分析这些序列的免费比对和自动方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/e63e3e273fdf/peerj-09-11456-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/af0408e0e415/peerj-09-11456-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/586f383c3dff/peerj-09-11456-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/f6f11de0233d/peerj-09-11456-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/f9d040bda1d1/peerj-09-11456-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/acbad223391e/peerj-09-11456-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/d440a09f29ed/peerj-09-11456-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/3550da03cbb2/peerj-09-11456-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/3c24bb392324/peerj-09-11456-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/55d945cd612c/peerj-09-11456-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/e63e3e273fdf/peerj-09-11456-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/af0408e0e415/peerj-09-11456-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/586f383c3dff/peerj-09-11456-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/f6f11de0233d/peerj-09-11456-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/f9d040bda1d1/peerj-09-11456-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/acbad223391e/peerj-09-11456-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/d440a09f29ed/peerj-09-11456-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/3550da03cbb2/peerj-09-11456-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/3c24bb392324/peerj-09-11456-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/55d945cd612c/peerj-09-11456-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e6/8140598/e63e3e273fdf/peerj-09-11456-g010.jpg

相似文献

1
-mer-based machine learning method to classify LTR-retrotransposons in plant genomes.基于-mer的机器学习方法对植物基因组中的LTR反转录转座子进行分类。
PeerJ. 2021 May 19;9:e11456. doi: 10.7717/peerj.11456. eCollection 2021.
2
Look4LTRs: a Long terminal repeat retrotransposon detection tool capable of cross species studies and discovering recently nested repeats.Look4LTRs:一种能够进行跨物种研究并发现近期嵌套重复序列的长末端重复逆转录转座子检测工具。
Mob DNA. 2024 Apr 16;15(1):8. doi: 10.1186/s13100-024-00317-w.
3
Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning.通过机器学习自动构建植物基因组中的 LTR 反转录转座子文库。
J Integr Bioinform. 2022 Jul 12;19(3). doi: 10.1515/jib-2021-0036. eCollection 2022 Sep 1.
4
InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning.InpactorDB:一个基于机器学习的自由对齐方法的分类谱系水平植物 LTR 反转录转座子参考文库。
Genes (Basel). 2021 Jan 28;12(2):190. doi: 10.3390/genes12020190.
5
DANTE and DANTE_LTR: lineage-centric annotation pipelines for long terminal repeat retrotransposons in plant genomes.DANTE和DANTE_LTR:用于植物基因组中长末端重复逆转录转座子的以谱系为中心的注释管道。
NAR Genom Bioinform. 2024 Aug 29;6(3):lqae113. doi: 10.1093/nargab/lqae113. eCollection 2024 Sep.
6
Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes.Inpactor2:一款基于深度学习的软件,用于鉴定和分类植物基因组中的 LTR 反转录转座子。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac511.
7
Comparative analysis of miniature inverted-repeat transposable elements (MITEs) and long terminal repeat (LTR) retrotransposons in six Citrus species.六种柑橘属植物中小型反转录转座子(MITEs)和长末端重复(LTR) retrotransposons 的比较分析。
BMC Plant Biol. 2019 Apr 15;19(1):140. doi: 10.1186/s12870-019-1757-3.
8
Inpactor, Integrated and Parallel Analyzer and Classifier of LTR Retrotransposons and Its Application for Pineapple LTR Retrotransposons Diversity and Dynamics.长末端重复序列反转录转座子的冲击器、集成并行分析器和分类器及其在菠萝长末端重复序列反转录转座子多样性和动态研究中的应用
Biology (Basel). 2018 May 25;7(2):32. doi: 10.3390/biology7020032.
9
LTRclassifier: A website for fast structural LTR retrotransposons classification in plants.LTR分类器:一个用于快速对植物中结构LTR逆转座子进行分类的网站。
Mob Genet Elements. 2016 Sep 26;6(6):e1241050. doi: 10.1080/2159256X.2016.1241050. eCollection 2016.
10
LTR_STRUC: a novel search and identification program for LTR retrotransposons.LTR_STRUC:一种用于长末端重复序列反转录转座子的新型搜索与识别程序。
Bioinformatics. 2003 Feb 12;19(3):362-7. doi: 10.1093/bioinformatics/btf878.

引用本文的文献

1
Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning.植物LTR反转录转座子中长末端重复序列的检测、分类及其可解释机器学习分析
BioData Min. 2024 Dec 18;17(1):57. doi: 10.1186/s13040-024-00410-z.
2
Effect of tokenization on transformers for biological sequences.词元化对生物序列变压器模型的影响。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae196.
3
Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks.

本文引用的文献

1
TERL: classification of transposable elements by convolutional neural networks.TERL:基于卷积神经网络的转座元件分类。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa185.
2
A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data.基于微阵列基因表达数据对癌症类型进行分类的机器学习和深度学习算法的比较研究。
PeerJ Comput Sci. 2020 Apr 13;6:e270. doi: 10.7717/peerj-cs.270. eCollection 2020.
3
InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning.
基因组对象检测:一种使用卷积神经网络改进的转座元件检测和分类方法。
PLoS One. 2023 Sep 21;18(9):e0291925. doi: 10.1371/journal.pone.0291925. eCollection 2023.
4
Reference-Free Plant Disease Detection Using Machine Learning and Long-Read Metagenomic Sequencing.基于机器学习和长读长测序的免参考植物病害检测
Appl Environ Microbiol. 2023 Jun 28;89(6):e0026023. doi: 10.1128/aem.00260-23. Epub 2023 May 15.
5
m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier.m6Aminer:通过将多种序列衍生特征融合到基于 CatBoost 的分类器中,预测 mRNA 上的 m6A 位点。
Int J Mol Sci. 2023 Apr 26;24(9):7878. doi: 10.3390/ijms24097878.
6
Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes.Inpactor2:一款基于深度学习的软件,用于鉴定和分类植物基因组中的 LTR 反转录转座子。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac511.
7
Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning.通过机器学习自动构建植物基因组中的 LTR 反转录转座子文库。
J Integr Bioinform. 2022 Jul 12;19(3). doi: 10.1515/jib-2021-0036. eCollection 2022 Sep 1.
8
An Atlas of Plant Transposable Elements.《植物转座元件图谱》。
F1000Res. 2021 Nov 24;10:1194. doi: 10.12688/f1000research.74524.1. eCollection 2021.
InpactorDB:一个基于机器学习的自由对齐方法的分类谱系水平植物 LTR 反转录转座子参考文库。
Genes (Basel). 2021 Jan 28;12(2):190. doi: 10.3390/genes12020190.
4
Modern deep learning in bioinformatics.生物信息学中的现代深度学习
J Mol Cell Biol. 2020 Oct 30;12(11):823-827. doi: 10.1093/jmcb/mjaa030.
5
DeepTE: a computational method for de novo classification of transposons with convolutional neural network.DeepTE:一种基于卷积神经网络的转座子从头分类计算方法。
Bioinformatics. 2020 Aug 1;36(15):4269-4275. doi: 10.1093/bioinformatics/btaa519.
6
A systematic review of the application of machine learning in the detection and classification of transposable elements.机器学习在转座元件检测与分类中的应用的系统综述。
PeerJ. 2019 Dec 18;7:e8311. doi: 10.7717/peerj.8311. eCollection 2019.
7
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline.针对可转座元件注释方法进行基准测试,以创建简化、全面的流水线。
Genome Biol. 2019 Dec 16;20(1):275. doi: 10.1186/s13059-019-1905-y.
8
Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning.植物基因组中的逆转座子:通过生物信息学和机器学习进行结构、鉴定和分类。
Int J Mol Sci. 2019 Aug 6;20(15):3837. doi: 10.3390/ijms20153837.
9
Deep learning: new computational modelling techniques for genomics.深度学习:基因组学的新计算建模技术。
Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6.
10
RepetDB: a unified resource for transposable element references.RepetDB:转座元件参考的统一资源。
Mob DNA. 2019 Jan 29;10:6. doi: 10.1186/s13100-019-0150-y. eCollection 2019.