• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的长末端重复逆转录转座子识别与分类框架。

A machine learning based framework to identify and classify long terminal repeat retrotransposons.

机构信息

Department of Computer Science, KU Leuven, Leuven, Belgium.

Department of Public Health and Primary Care, KU Leuven Kulak, Kortrijk, Belgium.

出版信息

PLoS Comput Biol. 2018 Apr 23;14(4):e1006097. doi: 10.1371/journal.pcbi.1006097. eCollection 2018 Apr.

DOI:10.1371/journal.pcbi.1006097
PMID:29684010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5933816/
Abstract

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.

摘要

转座元件(TEs)是构成真核生物基因组大部分的重复核苷酸序列。它们可以在基因组内移动和复制,增加基因组大小,并有助于物种内和种间的遗传多样性。准确识别和分类基因组中存在的 TEs 是理解它们对基因的影响及其在基因组进化中的作用的重要步骤。我们引入了 TE-Learner,这是一个基于机器学习的框架,可以自动识别给定基因组中的 TEs 并对其进行分类。我们展示了我们框架的 LTR 反转录转座子实现,这是一种特定类型的 TEs,其特征是在其边界处具有长末端重复(LTRs)。我们在已注释良好的果蝇和拟南芥基因组上评估了我们框架的预测性能,并将我们对三个 LTR 反转录转座子超家族的结果与三种广泛用于 TE 识别或分类的方法的结果进行了比较:RepeatMasker、Censor 和 LtrDigest。与这些方法不同,TE-Learner 是第一个将机器学习技术纳入其中的方法,在预测性能方面优于这些方法,同时能够有效地学习模型并进行预测。此外,我们表明,我们的方法能够识别出上述任何一种方法都无法找到的 TEs,并且我们研究了 TE-Learner 的预测,这些预测与官方注释不对应。事实证明,其中许多预测实际上与已知的 TE 具有很强的同源性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/f9c68132e287/pcbi.1006097.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/fbf9c45bfaf6/pcbi.1006097.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/e9e4ec8121aa/pcbi.1006097.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/04fa67e2e775/pcbi.1006097.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/3d495ee12066/pcbi.1006097.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/c48fcfb81409/pcbi.1006097.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/f97d07cf01f3/pcbi.1006097.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/5f56564a9bd5/pcbi.1006097.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/174cf426918f/pcbi.1006097.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/b912ff64d483/pcbi.1006097.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/88148cdb1130/pcbi.1006097.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/f9c68132e287/pcbi.1006097.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/fbf9c45bfaf6/pcbi.1006097.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/e9e4ec8121aa/pcbi.1006097.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/04fa67e2e775/pcbi.1006097.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/3d495ee12066/pcbi.1006097.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/c48fcfb81409/pcbi.1006097.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/f97d07cf01f3/pcbi.1006097.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/5f56564a9bd5/pcbi.1006097.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/174cf426918f/pcbi.1006097.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/b912ff64d483/pcbi.1006097.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/88148cdb1130/pcbi.1006097.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f39/5933816/f9c68132e287/pcbi.1006097.g011.jpg

相似文献

1
A machine learning based framework to identify and classify long terminal repeat retrotransposons.基于机器学习的长末端重复逆转录转座子识别与分类框架。
PLoS Comput Biol. 2018 Apr 23;14(4):e1006097. doi: 10.1371/journal.pcbi.1006097. eCollection 2018 Apr.
2
De novo identification of LTR retrotransposons in eukaryotic genomes.真核生物基因组中LTR反转录转座子的从头鉴定。
BMC Genomics. 2007 Apr 3;8:90. doi: 10.1186/1471-2164-8-90.
3
Fine-grained annotation and classification of de novo predicted LTR retrotransposons.从头预测的长末端重复序列转座子的细粒度标注和分类。
Nucleic Acids Res. 2009 Nov;37(21):7002-13. doi: 10.1093/nar/gkp759.
4
Amplification of the 1731 LTR retrotransposon in Drosophila melanogaster cultured cells: origin of neocopies and impact on the genome.黑腹果蝇培养细胞中1731 LTR反转录转座子的扩增:新拷贝的起源及其对基因组的影响
Gene. 2007 May 15;393(1-2):116-26. doi: 10.1016/j.gene.2007.02.001. Epub 2007 Feb 16.
5
Genomic re-assessment of the transposable element landscape of the potato genome.马铃薯基因组转座元件景观的基因组再评估。
Plant Cell Rep. 2020 Sep;39(9):1161-1174. doi: 10.1007/s00299-020-02554-8. Epub 2020 May 20.
6
The heterochromatic copies of the LTR retrotransposons as a record of the genomic events that have shaped the Drosophila melanogaster genome.LTR反转录转座子的异染色质拷贝作为塑造黑腹果蝇基因组的基因组事件的记录。
Gene. 2008 Mar 31;411(1-2):87-93. doi: 10.1016/j.gene.2008.01.010. Epub 2008 Jan 26.
7
The evolution of retrotransposon regulatory regions and its consequences on the Drosophila melanogaster and Homo sapiens host genomes.反转录转座子调控区域的进化及其对黑腹果蝇和人类宿主基因组的影响。
Gene. 2007 Apr 1;390(1-2):84-91. doi: 10.1016/j.gene.2006.08.005. Epub 2006 Aug 24.
8
TEnest: automated chronological annotation and visualization of nested plant transposable elements.TEnest:嵌套植物转座元件的自动时间顺序注释与可视化
Plant Physiol. 2008 Jan;146(1):45-59. doi: 10.1104/pp.107.110353. Epub 2007 Nov 21.
9
Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning.植物基因组中的逆转座子:通过生物信息学和机器学习进行结构、鉴定和分类。
Int J Mol Sci. 2019 Aug 6;20(15):3837. doi: 10.3390/ijms20153837.
10
New insights into nested long terminal repeat retrotransposons in Brassica species.甘蓝属物种中嵌套的长末端重复反转录转座子的新见解。
Mol Plant. 2013 Mar;6(2):470-82. doi: 10.1093/mp/sss081. Epub 2012 Aug 28.

引用本文的文献

1
Cysteine pattern barcoding-based dataset filtration enhances the machine learning-assisted interpretation of Conus venom peptide therapeutics.基于半胱氨酸模式条形码的数据集过滤增强了机器学习辅助的芋螺毒液肽疗法解释。
PLoS One. 2025 Jul 11;20(7):e0327578. doi: 10.1371/journal.pone.0327578. eCollection 2025.
2
Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning.植物LTR反转录转座子中长末端重复序列的检测、分类及其可解释机器学习分析
BioData Min. 2024 Dec 18;17(1):57. doi: 10.1186/s13040-024-00410-z.
3
Evolution of Plant Genome Size and Composition.

本文引用的文献

1
LTRclassifier: A website for fast structural LTR retrotransposons classification in plants.LTR分类器:一个用于快速对植物中结构LTR逆转座子进行分类的网站。
Mob Genet Elements. 2016 Sep 26;6(6):e1241050. doi: 10.1080/2159256X.2016.1241050. eCollection 2016.
2
A survey of transposable element classification systems--a call for a fundamental update to meet the challenge of their diversity and complexity.转座元件分类系统综述——呼吁进行根本性更新以应对其多样性和复杂性的挑战
Mol Phylogenet Evol. 2015 May;86:90-109. doi: 10.1016/j.ympev.2015.03.009. Epub 2015 Mar 20.
3
PASTEC: an automatic transposable element classification tool.
植物基因组大小与组成的演变
Genomics Proteomics Bioinformatics. 2024 Dec 3;22(5). doi: 10.1093/gpbjnl/qzae078.
4
Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks.基因组对象检测:一种使用卷积神经网络改进的转座元件检测和分类方法。
PLoS One. 2023 Sep 21;18(9):e0291925. doi: 10.1371/journal.pone.0291925. eCollection 2023.
5
Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes.Inpactor2:一款基于深度学习的软件,用于鉴定和分类植物基因组中的 LTR 反转录转座子。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac511.
6
Methodologies for the Discovery of Transposable Element Families.转座元件家族发现方法学
Genes (Basel). 2022 Apr 17;13(4):709. doi: 10.3390/genes13040709.
7
TransposonUltimate: software for transposon classification, annotation and detection.转座子终极分类注释检测软件
Nucleic Acids Res. 2022 Jun 24;50(11):e64. doi: 10.1093/nar/gkac136.
8
Finding and Characterizing Repeats in Plant Genomes.在植物基因组中寻找并鉴定重复序列
Methods Mol Biol. 2022;2443:327-385. doi: 10.1007/978-1-0716-2067-0_18.
9
-mer-based machine learning method to classify LTR-retrotransposons in plant genomes.基于-mer的机器学习方法对植物基因组中的LTR反转录转座子进行分类。
PeerJ. 2021 May 19;9:e11456. doi: 10.7717/peerj.11456. eCollection 2021.
10
InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning.InpactorDB:一个基于机器学习的自由对齐方法的分类谱系水平植物 LTR 反转录转座子参考文库。
Genes (Basel). 2021 Jan 28;12(2):190. doi: 10.3390/genes12020190.
PASTEC:一种自动转座元件分类工具。
PLoS One. 2014 May 2;9(5):e91929. doi: 10.1371/journal.pone.0091929. eCollection 2014.
4
Dfam: a database of repetitive DNA based on profile hidden Markov models.Dfam:基于隐马尔可夫模型的重复 DNA 数据库。
Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. doi: 10.1093/nar/gks1265. Epub 2012 Nov 30.
5
LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.LTRsift:一种用于从头检测到的 LTR 反转录转座子的半自动分类和后处理的图形用户界面。
Mob DNA. 2012 Nov 7;3(1):18. doi: 10.1186/1759-8753-3-18.
6
Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes.使用 REPCLASS 探索重复 DNA 景观,这是一种自动化分类真核生物基因组中转座元件的工具。
Genome Biol Evol. 2009 Jul 23;1:205-20. doi: 10.1093/gbe/evp023.
7
Fine-grained annotation and classification of de novo predicted LTR retrotransposons.从头预测的长末端重复序列转座子的细粒度标注和分类。
Nucleic Acids Res. 2009 Nov;37(21):7002-13. doi: 10.1093/nar/gkp759.
8
TEclass--a tool for automated classification of unknown eukaryotic transposable elements.TEclass——一种用于未知真核转座元件自动分类的工具。
Bioinformatics. 2009 May 15;25(10):1329-30. doi: 10.1093/bioinformatics/btp084. Epub 2009 Apr 5.
9
LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons.LTRharvest,一款用于从头检测LTR逆转座子的高效灵活软件。
BMC Bioinformatics. 2008 Jan 14;9:18. doi: 10.1186/1471-2105-9-18.
10
A unified classification system for eukaryotic transposable elements.真核生物转座元件的统一分类系统。
Nat Rev Genet. 2007 Dec;8(12):973-82. doi: 10.1038/nrg2165.