Luo Junwei, Wang Jiaojiao, Wei Jingjing, Yan Chaokun, Luo Huimin
School of Software, Henan Polytechnic University, Century Road 2001, Jiaozuo 454003, China.
College of Chemical and Environmental Engineering, Anyang Institute of Technology, West Section of Huanghe Avenue, Anyang 455000, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae656.
Gene polymorphism originates from single-nucleotide polymorphisms (SNPs), and the analysis and study of SNPs are of great significance in the field of biogenetics. The haplotype, which consists of the sequence of SNP loci, carries more genetic information than a single SNP. Haplotype assembly plays a significant role in understanding gene function, diagnosing complex diseases, and pinpointing species genes. We propose a novel method, DeepHapNet, for haplotype assembly through the clustering of reads and learning correlations between read pairs. We employ a sequence model called Retentive Network (RetNet), which utilizes a multiscale retention mechanism to extract read features and learn the global relationships among them. Based on the feature representation of reads learned from the RetNet model, the clustering process of reads is implemented using the SpectralNet model, and, finally, haplotypes are constructed based on the read clusters. Experiments with simulated and real datasets show that the method performs well in the haplotype assembly problem of diploid and polyploid based on either long or short reads. The code implementation of DeepHapNet and the processing scripts for experimental data are publicly available at https://github.com/wjj6666/DeepHapNet.
基因多态性源于单核苷酸多态性(SNP),SNP的分析与研究在生物遗传学领域具有重要意义。由SNP位点序列组成的单倍型携带的遗传信息比单个SNP更多。单倍型组装在理解基因功能、诊断复杂疾病以及确定物种基因方面发挥着重要作用。我们提出了一种名为DeepHapNet的新方法,通过对 reads 进行聚类并学习 read 对之间的相关性来进行单倍型组装。我们采用一种名为Retention Network(RetNet)的序列模型,该模型利用多尺度保留机制来提取 read 特征并学习它们之间的全局关系。基于从RetNet模型中学习到的reads的特征表示,使用SpectralNet模型实现reads的聚类过程,最后根据read聚类构建单倍型。对模拟数据集和真实数据集的实验表明,该方法在基于长reads或短reads的二倍体和多倍体单倍型组装问题中表现良好。DeepHapNet的代码实现和实验数据的处理脚本可在https://github.com/wjj6666/DeepHapNet上公开获取。