Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan 450000, China.
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae557.
Mobile genetic elements (MEs) are heritable mutagens that significantly contribute to genetic diseases. The advent of long-read sequencing technologies, capable of resolving large DNA fragments, offers promising prospects for the comprehensive detection of ME variants (MEVs). However, achieving high precision while maintaining recall performance remains challenging mainly brought by the variable length and similar content of MEV signatures, which are often obscured by the noise in long reads. Here, we propose MEHunter, a high-performance MEV detection approach utilizing a fine-tuned transformer model adept at identifying potential MEVs with fragmented features. Benchmark experiments on both simulated and real datasets demonstrate that MEHunter consistently achieves higher accuracy and sensitivity than the state-of-the-art tools. Furthermore, it is capable of detecting novel potentially individual-specific MEVs that have been overlooked in published population projects.
MEHunter is available from https://github.com/120L021101/MEHunter.
移动遗传元件(MEs)是可遗传的诱变剂,它们对遗传疾病有很大的贡献。长读测序技术的出现,能够解析大的 DNA 片段,为 ME 变体(MEVs)的全面检测提供了有前景的方法。然而,主要由于 MEV 特征的可变长度和相似内容,实现高精度同时保持召回性能仍然具有挑战性,这些特征通常被长读序列中的噪声所掩盖。在这里,我们提出了 MEHunter,一种利用微调的转换器模型进行 MEV 检测的高性能方法,该模型擅长识别具有碎片化特征的潜在 MEVs。在模拟和真实数据集上的基准实验表明,MEHunter 始终比最先进的工具具有更高的准确性和灵敏度。此外,它还能够检测到在已发表的群体项目中被忽视的新的潜在个体特异性 MEVs。
MEHunter 可从 https://github.com/120L021101/MEHunter 获得。