Guo Kaixuan, Zhong Zhanming, Zeng Haonan, Zhang Changliang, Chitotombe Teddy Tinashe, Teng Jinyan, Gao Yahui, Zhang Zhe
State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China.
BMC Genomics. 2025 Mar 13;26(1):245. doi: 10.1186/s12864-025-11411-5.
RNA sequencing (RNA-seq) is a powerful tool for transcriptome profiling, enabling integrative studies of expression quantitative trait loci (eQTL). As it identifies fewer genetic variants than DNA sequencing (DNA-seq), reference panel-based genotype imputation is often required to enhance its utility.
This study evaluated the accuracy of genotype imputation using SNPs called from RNA-seq data (RNA-SNPs). SNP features from 6,567 RNA-seq samples across 28 pig tissues were used to mask whole genome sequencing (WGS) data, with the Pig Genomic Reference Panel (PGRP) serving as the reference panel. Three imputation software tools (i.e., Beagle, Minimac4, and Impute5) were employed to perform the imputation. The result showed that RNA-SNPs achieved higher imputation accuracy (CR: 0.895 ~ 0.933; r²: 0.745 ~ 0.817) than SNPs from GeneSeek Genomic Profiler Porcine SNP50 BeadChip (Chip-SNPs) (CR: 0.873 ~ 0.909; r²: 0.629 ~ 0.698), and lower accuracy in "intergenic" regions. After imputation, quality control (QC) by minor allele frequency (MAF) and imputation quality (DR²) could improve r² but reduce SNP retention. Among software, Minimac4 takes the least runtime in single-thread setting, while Beagle performed best in multi-thread setting and phasing. Impute5 takes up minimal memory usage but requires the maximum runtime. All tools demonstrated comparable global accuracy (CR: 0.906 ~ 0.917; r²: 0.780 ~ 0.787).
This study offers practical guidance for conducting RNA-SNP imputation strategies in genome and transcriptome research.
RNA测序(RNA-seq)是转录组分析的强大工具,可实现表达数量性状基因座(eQTL)的综合研究。由于其识别的遗传变异比DNA测序(DNA-seq)少,通常需要基于参考面板的基因型填充来提高其效用。
本研究评估了使用从RNA-seq数据中调用的单核苷酸多态性(RNA-SNPs)进行基因型填充的准确性。来自28个猪组织的6567个RNA-seq样本的SNP特征用于掩盖全基因组测序(WGS)数据,猪基因组参考面板(PGRP)作为参考面板。使用三种填充软件工具(即Beagle、Minimac4和Impute5)进行填充。结果表明,RNA-SNPs的填充准确性(CR:0.8950.933;r²:0.7450.817)高于GeneSeek基因组分析仪猪SNP50芯片(芯片-SNPs)的SNP(CR:0.8730.909;r²:0.6290.698),在“基因间”区域准确性较低。填充后,通过次要等位基因频率(MAF)和填充质量(DR²)进行质量控制(QC)可以提高r²但减少SNP保留率。在软件中,Minimac4在单线程设置下运行时间最短,而Beagle在多线程设置和定相方面表现最佳。Impute5占用内存最少,但运行时间最长。所有工具的整体准确性相当(CR:0.9060.917;r²:0.7800.787)。
本研究为在基因组和转录组研究中实施RNA-SNP填充策略提供了实用指导。