Wang Jianan, Sun Zhenyuan, Wang Guohua, Miao Yan
Faculty of Computing, Harbin Institute of Technology, No. 92 Xidazhi Street, Nangang District, Harbin 150001, China.
College of Computer and Control Engineering, Northeast Forestry University, No. 26 He Xing Road, Xiangfang District, Harbin 150040, China.
Int J Mol Sci. 2025 Jun 13;26(12):5700. doi: 10.3390/ijms26125700.
Haplotype phasing refers to determining the haplotype sequences inherited from each parent in a diploid organism. It is a critical process for various downstream analyses, and numerous haplotype phasing methods for genomic single nucleotide polymorphisms (SNPs) have been developed. Allele-specific (AS) expression and alternative splicing play key roles in diverse biological processes. AS studies usually focus more on exonic SNPs, and multiple phased SNPs need to be combined to obtain better inferences. In this paper, we introduce an alignment-free algorithm HPTAS for haplotype phasing in AS studies. Instead of using sequence alignment to count the number of reads covering SNPs, HPTAS constructs a mapping structure from transcriptome annotations and SNPs and employs a k-mer-based approach to derive phasing counts from RNA-seq data. Using both next-generation sequencing (NGS) and the third-generation sequencing (TGS) NA12878 RNA-seq data and comparing with the most advanced algorithm in the field, we have demonstrated that HPTAS achieves high phasing accuracy and performance and that transcriptome data indeed facilitates the phasing of exonic SNPs. With the continued advancement of sequencing technology and the improvement in transcriptome annotations, HPTAS may serve as a foundation for future haplotype phasing methods.
单倍型定相是指确定二倍体生物中从每个亲本遗传而来的单倍型序列。它是各种下游分析的关键过程,并且已经开发出了许多用于基因组单核苷酸多态性(SNP)的单倍型定相方法。等位基因特异性(AS)表达和可变剪接在多种生物学过程中起关键作用。AS研究通常更关注外显子SNP,需要组合多个定相的SNP以获得更好的推断。在本文中,我们介绍了一种用于AS研究中单倍型定相的无比对算法HPTAS。HPTAS不是使用序列比对来计算覆盖SNP的 reads 数量,而是从转录组注释和SNP构建一个映射结构,并采用基于k-mer的方法从RNA-seq数据中得出定相计数。使用下一代测序(NGS)和第三代测序(TGS)的NA12878 RNA-seq数据,并与该领域最先进的算法进行比较,我们证明了HPTAS实现了高定相准确性和性能,并且转录组数据确实有助于外显子SNP的定相。随着测序技术的不断进步和转录组注释的改进,HPTAS可能成为未来单倍型定相方法的基础。