Suppr超能文献

STR传感器:一种从大规模平行测序数据中进行STR等位基因分型的计算高效方法。

STRsensor: a computationally efficient method for STR allele-typing from massively parallel sequencing data.

作者信息

Zhang Xiaolong, Ji Xianchao, Wang Lingxiang, Chi Lianjiang, Li Chengtao, Wen Shaoqing, Chen Hua

机构信息

Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.

School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae637.

Abstract

Short tandem repeats (STRs) represent one of the most polymorphic variations in the human genome, finding extensive applications in forensics, population genetics and medical genetics. In contrast to the traditional capillary electrophoresis (CE) method, genotyping STRs using massive parallel sequencing technology offers enhanced sensitivity and accuracy. However, current methods are mainly designed for target sequencing with higher coverage for a specific STR locus, thereby constraining the utility of STRs in low- and medium-coverage whole genome sequencing (WGS) data. Here, we introduce STRsensor, a method designed to type STR alleles in low-coverage WGS data and target sequencing data, achieving a significant high detection ratio and accuracy. STRsensor employs two methods for STR allele-typing: the Kmers-based method and the CIGAR-based method. Furthermore, by incorporating a model for PCR stutters, STRsensor greatly enhances the accuracy of STR allele typing. With simulation data, we demonstrate that STRsensor achieves a detection ratio of 100$%$ and an accuracy of 99.37$%$ for a 30$\times $ WGS data, outperforming the existing methods, such as STRait Razor, STRinNGS, and HipSTR. When applied to real target sequencing data from 687 individuals, STRsensor achieves a detection ratio of 99.64$%$ and an accuracy of 99.99$%$. Moreover, STRsensor is a computationally efficient method that runs 79 times faster than HipSTR and 10 000 times faster than STRinNGS. STRsensor is freely available on GitHub: https://github.com/ChenHuaLab/STRsensor.

摘要

短串联重复序列(STRs)是人类基因组中最具多态性的变异之一,在法医学、群体遗传学和医学遗传学中有着广泛的应用。与传统的毛细管电泳(CE)方法相比,使用大规模平行测序技术对STRs进行基因分型具有更高的灵敏度和准确性。然而,目前的方法主要是针对特定STR位点进行高覆盖度的靶向测序设计的,从而限制了STRs在低覆盖度和中等覆盖度全基因组测序(WGS)数据中的应用。在此,我们介绍了STRsensor,这是一种用于在低覆盖度WGS数据和靶向测序数据中对STR等位基因进行分型的方法,具有显著的高检测率和准确性。STRsensor采用两种方法进行STR等位基因分型:基于Kmers的方法和基于CIGAR的方法。此外,通过纳入一个PCR滑脱模型,STRsensor大大提高了STR等位基因分型的准确性。通过模拟数据,我们证明对于30×的WGS数据,STRsensor的检测率达到100%,准确率达到99.37%,优于现有方法,如STRait Razor、STRinNGS和HipSTR。当应用于687个个体的真实靶向测序数据时,STRsensor的检测率为99.64%,准确率为99.99%。此外,STRsensor是一种计算效率高的方法,其运行速度比HipSTR快79倍,比STRinNGS快10000倍。STRsensor可在GitHub上免费获取:https://github.com/ChenHuaLab/STRsensor。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947c/11635639/4e573890ccc6/bbae637f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验