用于分析FASTQ或BAM文件中STR区域的Python脚本STRinNGS介绍以及丹麦STR序列数据库扩展至11个STR。

Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs.

作者信息

Friis Susanne L, Buchard Anders, Rockenbauer Eszter, Børsting Claus, Morling Niels

机构信息

Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.

出版信息

Forensic Sci Int Genet. 2016 Mar;21:68-75. doi: 10.1016/j.fsigen.2015.12.006. Epub 2015 Dec 12.

DOI:10.1016/j.fsigen.2015.12.006

PMID:26722765

Abstract

This work introduces the in-house developed Python application STRinNGS for analysis of STR sequence elements in BAM or FASTQ files. STRinNGS identifies sequence reads with STR loci by their flanking sequences, it analyses the STR sequence and the flanking regions, and generates a report with the assigned SNP-STR alleles. The main output file from STRinNGS contains all sequences with read counts above 1% of the total number of reads per locus. STR sequences are automatically named according to the nomenclature used previously and according to the repeat unit definitions in STRBase (http://www.cstl.nist.gov/strbase/). The sequences are named with (1) the locus name, (2) the length of the repeat region divided by the length of the repeat unit, (3) the sequence(s) of the repeat unit(s) followed by the number of repeats and (4) variations in the flanking regions. Lower case letters in the main output file are used to flag sequences with previously unknown variations in the STRs. SNPs in the flanking regions are named by their "rs" numbers and the nucleotides in the SNP position. Data from 207 Danes sequenced with the Ion Torrent™ HID STR 10-plex that amplified nine STRs (CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D16S539, TH01, TPOX, vWA), and Amelogenin was analysed with STRinNGS. Sequencing uncovered five common SNPs near four STRs and revealed 20 new alleles in the 207 Danes. Three short homopolymers in the D8S1179 flanking regions caused frequent sequencing errors. In 29 of 3726 allele calls (0.8%), sequences with homopolymer errors were falsely assigned as true alleles. An in-house developed script in R compensated for these errors by compiling sequence reads that had identical STR sequences and identical nucleotides in the five common SNPs. In the output file from the R script, all SNP-STR haplotype calls were correct. The 207 samples and six additional samples were sequenced for D3S1358, D12S391, and D21S11 using the 454 GS Junior platform in this and a previous work. Overall, next generation sequencing (NGS) of the 11 STRs lowered the mean match probability 386 times and increased the typical paternity indexes (i.e. the geometric mean) for trios and duos 47 and 23 times, respectively, compared to the traditional PCR-CE typing of the same population.

摘要

这项工作介绍了内部开发的Python应用程序STRinNGS，用于分析BAM或FASTQ文件中的STR序列元件。STRinNGS通过侧翼序列识别具有STR位点的序列读数，分析STR序列及其侧翼区域，并生成包含指定SNP-STR等位基因的报告。STRinNGS的主要输出文件包含所有读数计数超过每个位点读数总数1%的序列。STR序列根据先前使用的命名法以及STRBase（http://www.cstl.nist.gov/strbase/）中的重复单元定义自动命名。序列命名包括：（1）位点名称；（2）重复区域长度除以重复单元长度；（3）重复单元序列，后面跟着重复次数；（4）侧翼区域的变异。主要输出文件中的小写字母用于标记STR中先前未知变异的序列。侧翼区域的SNP通过其“rs”编号和SNP位置的核苷酸命名。使用Ion Torrent™ HID STR 10重试剂盒对207名丹麦人进行测序，该试剂盒扩增了9个STR（CSF1PO、D3S1358、D5S818、D7S820、D8S1179、D16S539、TH01、TPOX、vWA），并使用STRinNGS分析了牙釉蛋白。测序发现四个STR附近有五个常见SNP，并在207名丹麦人中发现了20个新等位基因。D8S1179侧翼区域的三个短同聚物导致频繁的测序错误。在3726个等位基因调用中，有29个（0.8%），具有同聚物错误的序列被错误地指定为真实等位基因。R语言中内部开发的脚本通过汇编在五个常见SNP中具有相同STR序列和相同核苷酸的序列读数来补偿这些错误。在R脚本的输出文件中，所有SNP-STR单倍型调用都是正确的。在这项工作和之前的一项工作中，使用454 GS Junior平台对207个样本和另外六个样本进行了D3S1358、D12S391和D21S11的测序。总体而言，与同一人群的传统PCR-CE分型相比，对11个STR进行下一代测序（NGS）将平均匹配概率降低了386倍，将三联体和二联体的典型父权指数（即几何平均值）分别提高了47倍和23倍。

相似文献

Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs.用于分析FASTQ或BAM文件中STR区域的Python脚本STRinNGS介绍以及丹麦STR序列数据库扩展至11个STR。

Forensic Sci Int Genet. 2016 Mar;21:68-75. doi: 10.1016/j.fsigen.2015.12.006. Epub 2015 Dec 12.

The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit.丹麦STR序列数据库：使用ForenSeq™ DNA签名制备试剂盒对363名丹麦人进行重复分型

Int J Legal Med. 2019 Mar;133(2):325-334. doi: 10.1007/s00414-018-1854-0. Epub 2018 May 24.

STR allele sequence variation: Current knowledge and future issues.短串联重复序列（STR）等位基因序列变异：当前认知与未来问题

Forensic Sci Int Genet. 2015 Sep;18:118-30. doi: 10.1016/j.fsigen.2015.06.005. Epub 2015 Jul 6.

STRinNGS v2.0: Improved tool for analysis and reporting of STR sequencing data.STRinNGS v2.0：用于 STR 测序数据分析和报告的改进工具。

Forensic Sci Int Genet. 2020 Sep;48:102331. doi: 10.1016/j.fsigen.2020.102331. Epub 2020 Jun 20.

Massively parallel sequence data of 31 autosomal STR loci from 496 Spanish individuals revealed concordance with CE-STR technology and enhanced discrimination power.对 496 名西班牙个体的 31 个常染色体 STR 基因座的大规模平行测序数据与 CE-STR 技术一致，并提高了分辨能力。

Forensic Sci Int Genet. 2019 Sep;42:49-55. doi: 10.1016/j.fsigen.2019.06.009. Epub 2019 Jun 14.

Second generation sequencing of three STRs D3S1358, D12S391 and D21S11 in Danes and a new nomenclature for sequenced STR alleles.丹麦人群中三个 STR 基因座 D3S1358、D12S391 和 D21S11 的二代测序及测序 STR 等位基因的新命名法

Forensic Sci Int Genet. 2014 Sep;12:38-41. doi: 10.1016/j.fsigen.2014.04.016. Epub 2014 May 10.

Second-generation sequencing of forensic STRs using the Ion Torrent™ HID STR 10-plex and the Ion PGM™.使用Ion Torrent™ HID STR 10重试剂盒和Ion PGM™对法医STR进行二代测序。

Forensic Sci Int Genet. 2015 Jan;14:132-40. doi: 10.1016/j.fsigen.2014.09.020. Epub 2014 Oct 5.

Evaluation of the Early Access STR Kit v1 on the Ion Torrent PGM™ platform.Ion Torrent PGM™平台上早期准入STR试剂盒v1的评估

Forensic Sci Int Genet. 2016 Jul;23:111-120. doi: 10.1016/j.fsigen.2016.04.004. Epub 2016 Apr 4.

Towards simultaneous individual and tissue identification: A proof-of-principle study on parallel sequencing of STRs, amelogenin, and mRNAs with the Ion Torrent PGM.迈向个体与组织同时鉴定：利用Ion Torrent PGM对短串联重复序列、牙釉蛋白和信使核糖核酸进行平行测序的原理验证研究

Forensic Sci Int Genet. 2015 Jul;17:122-128. doi: 10.1016/j.fsigen.2015.04.002. Epub 2015 Apr 17.

Sequence-based diversity of 23 autosomal STR loci in Koreans investigated using an in-house massively parallel sequencing panel.使用内部大规模平行测序平台对韩国人群23个常染色体STR基因座基于序列的多样性进行研究。

Forensic Sci Int Genet. 2017 Sep;30:134-140. doi: 10.1016/j.fsigen.2017.07.001. Epub 2017 Jul 9.

引用本文的文献

STRaM: A genetic framework for improved cell product provenance for research and clinical translations.STRaM：用于改善细胞产品来源以促进研究和临床转化的遗传框架。

Commun Biol. 2025 Aug 15;8(1):1232. doi: 10.1038/s42003-025-08547-1.

Transcriptome Analysis and HPLC Profiling of Flavonoid Biosynthesis in L. during Its Key Developmental Stages.番茄关键发育阶段黄酮类生物合成的转录组分析及高效液相色谱分析

Biology (Basel). 2022 Jul 19;11(7):1078. doi: 10.3390/biology11071078.

Sequencing of human identification markers in an Uyghur population using the MiSeq FGx Forensic Genomics System.使用MiSeq FGx法医基因组学系统对维吾尔族人群中的人类识别标记进行测序。

Forensic Sci Res. 2020 Sep 10;7(2):154-162. doi: 10.1080/20961790.2020.1779967. eCollection 2022.

Characterization of 58 STRs and 94 SNPs with the ForenSeq™ DNA signature prep kit in Mexican-Mestizos from the Monterrey city (Northeast, Mexico).使用 ForenSeq™ DNA 签名制备试剂盒对来自墨西哥新莱昂州蒙特雷市（东北部，墨西哥）的墨西哥裔混血人群中的 58 个 STR 标记和 94 个 SNP 进行特征分析。

Mol Biol Rep. 2022 Aug;49(8):7601-7609. doi: 10.1007/s11033-022-07575-y. Epub 2022 Jun 3.

An Introductory Overview of Open-Source and Commercial Software Options for the Analysis of Forensic Sequencing Data.开源和商业软件选项在法医测序数据分析中的应用概述

Genes (Basel). 2021 Oct 29;12(11):1739. doi: 10.3390/genes12111739.

Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing.利用大规模平行测序技术在藏族人群中鉴定 58 个 STR 和 94 个 iiSNP 的序列多态性。

Sci Rep. 2020 Jul 22;10(1):12225. doi: 10.1038/s41598-020-69137-1.

Report from the STRAND Working Group on the 2019 STR sequence nomenclature meeting.来自 STRAND 工作组关于 2019 STR 序列命名法会议的报告。

Forensic Sci Int Genet. 2019 Nov;43:102165. doi: 10.1016/j.fsigen.2019.102165. Epub 2019 Sep 21.

Sequencing of 231 forensic genetic markers using the MiSeq FGx™ forensic genomics system - an evaluation of the assay and software.使用MiSeq FGx™法医基因组学系统对231个法医遗传标记进行测序——检测方法和软件评估

Forensic Sci Res. 2018 Apr 9;3(2):111-123. doi: 10.1080/20961790.2018.1446672. eCollection 2018.

Sequence-based U.S. population data for 27 autosomal STR loci.基于序列的 27 个常染色体 STR 基因座的美国人群数据。

Forensic Sci Int Genet. 2018 Nov;37:106-115. doi: 10.1016/j.fsigen.2018.07.013. Epub 2018 Jul 19.

Int J Legal Med. 2019 Mar;133(2):325-334. doi: 10.1007/s00414-018-1854-0. Epub 2018 May 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于分析FASTQ或BAM文件中STR区域的Python脚本STRinNGS介绍以及丹麦STR序列数据库扩展至11个STR。

Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献