Zhang Xiaolong, Ji Xianchao, Wang Lingxiang, Chi Lianjiang, Li Chengtao, Wen Shaoqing, Chen Hua
Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae637.
Short tandem repeats (STRs) represent one of the most polymorphic variations in the human genome, finding extensive applications in forensics, population genetics and medical genetics. In contrast to the traditional capillary electrophoresis (CE) method, genotyping STRs using massive parallel sequencing technology offers enhanced sensitivity and accuracy. However, current methods are mainly designed for target sequencing with higher coverage for a specific STR locus, thereby constraining the utility of STRs in low- and medium-coverage whole genome sequencing (WGS) data. Here, we introduce STRsensor, a method designed to type STR alleles in low-coverage WGS data and target sequencing data, achieving a significant high detection ratio and accuracy. STRsensor employs two methods for STR allele-typing: the Kmers-based method and the CIGAR-based method. Furthermore, by incorporating a model for PCR stutters, STRsensor greatly enhances the accuracy of STR allele typing. With simulation data, we demonstrate that STRsensor achieves a detection ratio of 100$%$ and an accuracy of 99.37$%$ for a 30$\times $ WGS data, outperforming the existing methods, such as STRait Razor, STRinNGS, and HipSTR. When applied to real target sequencing data from 687 individuals, STRsensor achieves a detection ratio of 99.64$%$ and an accuracy of 99.99$%$. Moreover, STRsensor is a computationally efficient method that runs 79 times faster than HipSTR and 10 000 times faster than STRinNGS. STRsensor is freely available on GitHub: https://github.com/ChenHuaLab/STRsensor.
短串联重复序列(STRs)是人类基因组中最具多态性的变异之一,在法医学、群体遗传学和医学遗传学中有着广泛的应用。与传统的毛细管电泳(CE)方法相比,使用大规模平行测序技术对STRs进行基因分型具有更高的灵敏度和准确性。然而,目前的方法主要是针对特定STR位点进行高覆盖度的靶向测序设计的,从而限制了STRs在低覆盖度和中等覆盖度全基因组测序(WGS)数据中的应用。在此,我们介绍了STRsensor,这是一种用于在低覆盖度WGS数据和靶向测序数据中对STR等位基因进行分型的方法,具有显著的高检测率和准确性。STRsensor采用两种方法进行STR等位基因分型:基于Kmers的方法和基于CIGAR的方法。此外,通过纳入一个PCR滑脱模型,STRsensor大大提高了STR等位基因分型的准确性。通过模拟数据,我们证明对于30×的WGS数据,STRsensor的检测率达到100%,准确率达到99.37%,优于现有方法,如STRait Razor、STRinNGS和HipSTR。当应用于687个个体的真实靶向测序数据时,STRsensor的检测率为99.64%,准确率为99.99%。此外,STRsensor是一种计算效率高的方法,其运行速度比HipSTR快79倍,比STRinNGS快10000倍。STRsensor可在GitHub上免费获取:https://github.com/ChenHuaLab/STRsensor。