SAM：基于字符串的线粒体 DNA 数据库查询序列搜索算法。

SAM: String-based sequence search algorithm for mitochondrial DNA database queries.

机构信息

Institute of Mathematics, University of Innsbruck, Technikerstrasse 13, 6020 Innsbruck, Austria.

出版信息

Forensic Sci Int Genet. 2011 Mar;5(2):126-32. doi: 10.1016/j.fsigen.2010.10.006. Epub 2010 Nov 5.

DOI:10.1016/j.fsigen.2010.10.006

PMID:21056022

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3064999/

Abstract

The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org).

摘要

单体型线粒体 (mt) 基因组分析在法医学和群体遗传学以及疾病研究中具有多种应用。虽然 mtDNA 单体型通常通过测序来确定，但它们很少以核苷酸序列的形式报告。传统上，它们相对于第一个测序的 mtDNA 的校正版本以差异编码的基于位置的格式呈现。这一惯例要求推荐标准化的序列比对，而这种序列比对在不同的科学学科之间甚至在不同的实验室之间都存在差异。因此，对于 mtDNA 数据的解释至关重要的数据库搜索可能会因查询和数据库单体型的注释方式不同而导致有偏差的结果。在法医环境中，这通常会导致绝对和相对频率的低估。为了解决这个问题，我们引入了 SAM，这是一种基于字符串的搜索算法，它将查询和数据库序列转换为无位置的核苷酸序列，从而消除了在数据库查询中可能会错过相同序列的可能性。仅仅应用 BLAST 算法是不够的，因为它使用启发式方法，并且不能解决与 mtDNA 特定的问题，例如进化上稳定但也快速进化的插入和缺失事件。本文介绍的软件提供了额外的灵活性，可以结合系统发育数据、特定位置的突变率和其他与生物学相关的信息，从而细化对线粒体 DNA 数据的解释。本文附有免费软件和示例数据集，可用于评估新软件 (http://stringvalidation.org)。

相似文献

SAM: String-based sequence search algorithm for mitochondrial DNA database queries.

Forensic Sci Int Genet. 2011 Mar;5(2):126-32. doi: 10.1016/j.fsigen.2010.10.006. Epub 2010 Nov 5.

Next generation database search algorithm for forensic mitogenome analyses.

Forensic Sci Int Genet. 2018 Nov;37:204-214. doi: 10.1016/j.fsigen.2018.09.001. Epub 2018 Sep 9.

DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.

Forensic Sci Int Genet. 2014 Nov;13:134-42. doi: 10.1016/j.fsigen.2014.07.010. Epub 2014 Jul 29.

Inspecting close maternal relatedness: Towards better mtDNA population samples in forensic databases.

Forensic Sci Int Genet. 2011 Mar;5(2):138-41. doi: 10.1016/j.fsigen.2010.10.001. Epub 2010 Nov 9.

HAPLOFIND: a new method for high-throughput mtDNA haplogroup assignment.

Hum Mutat. 2013 Sep;34(9):1189-94. doi: 10.1002/humu.22356. Epub 2013 Jun 12.

Fine-Tuning Phylogenetic Alignment and Haplogrouping of mtDNA Sequences.

Int J Mol Sci. 2021 May 27;22(11):5747. doi: 10.3390/ijms22115747.

EMPOP--a forensic mtDNA database.

Forensic Sci Int Genet. 2007 Jun;1(2):88-92. doi: 10.1016/j.fsigen.2007.01.018. Epub 2007 Mar 7.

Canis mtDNA HV1 database: a web-based tool for collecting and surveying Canis mtDNA HV1 haplotype in public database.

BMC Genet. 2017 Jun 26;18(1):60. doi: 10.1186/s12863-017-0528-0.

Evaluation of GeneMarker HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment.

Forensic Sci Int Genet. 2017 May;28:90-98. doi: 10.1016/j.fsigen.2017.01.016. Epub 2017 Feb 6.

mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences.

BMC Bioinformatics. 2008 Nov 17;9:483. doi: 10.1186/1471-2105-9-483.

引用本文的文献

mitoLEAF: mitochondrial DNA Lineage, Evolution, Annotation Framework.

NAR Genom Bioinform. 2025 Jun 11;7(2):lqaf079. doi: 10.1093/nargab/lqaf079. eCollection 2025 Jun.

Fine-Tuning Phylogenetic Alignment and Haplogrouping of mtDNA Sequences.

Int J Mol Sci. 2021 May 27;22(11):5747. doi: 10.3390/ijms22115747.

Graph Algorithms for Mixture Interpretation.

Genes (Basel). 2021 Jan 27;12(2):185. doi: 10.3390/genes12020185.

Claudin-7 indirectly regulates the integrin/FAK signaling pathway in human colon cancer tissue.

J Hum Genet. 2016 Aug;61(8):711-20. doi: 10.1038/jhg.2016.35. Epub 2016 Apr 28.

Length heteroplasmy of the polyC-polyT-polyC stretch in the dog mtDNA control region.

Int J Legal Med. 2015 Sep;129(5):927-35. doi: 10.1007/s00414-014-1106-x. Epub 2014 Nov 14.

Reviewing population studies for forensic purposes: Dog mitochondrial DNA.

Zookeys. 2013 Dec 30(365):381-411. doi: 10.3897/zookeys.365.5859.

Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA).

Forensic Sci Int Genet. 2013 Dec;7(6):601-609. doi: 10.1016/j.fsigen.2013.07.005. Epub 2013 Aug 12.

Mitochondrial DNA 4977 bp deletion is a common phenomenon in hair and increases with age.

Bosn J Basic Med Sci. 2012 Aug;12(3):187-92. doi: 10.17305/bjbms.2012.2480.

本文引用的文献

Mitochondrial control region sequences from an African American population sample.

Forensic Sci Int Genet. 2009 Dec;4(1):e45-52. doi: 10.1016/j.fsigen.2009.04.010. Epub 2009 May 31.

Forensic and phylogeographic characterization of mtDNA lineages from northern Thailand (Chiang Mai).

Int J Legal Med. 2009 Nov;123(6):495-501. doi: 10.1007/s00414-009-0373-4.

EMPOP--a forensic mtDNA database.

Forensic Sci Int Genet. 2007 Jun;1(2):88-92. doi: 10.1016/j.fsigen.2007.01.018. Epub 2007 Mar 7.

Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.

Hum Mutat. 2009 Feb;30(2):E386-94. doi: 10.1002/humu.20921.

Consistent treatment of length variants in the human mtDNA control region: a reappraisal.

Int J Legal Med. 2008 Jan;122(1):11-21. doi: 10.1007/s00414-006-0151-5. Epub 2007 Mar 9.

Generating population data for the EMPOP database - an overview of the mtDNA sequencing and data evaluation processes considering 273 Austrian control region sequences as example.

Forensic Sci Int. 2007 Mar 2;166(2-3):164-75. doi: 10.1016/j.forsciint.2006.05.006. Epub 2006 Jul 7.

Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.

Int J Legal Med. 2004 Oct;118(5):294-306. doi: 10.1007/s00414-004-0466-z.

Recommendations for consistent treatment of length variants in the human mitochondrial DNA control region.

Forensic Sci Int. 2002 Sep 10;129(1):35-42. doi: 10.1016/s0379-0738(02)00206-2.

Considerations by the European DNA profiling (EDNAP) group on the working practices, nomenclature and interpretation of mitochondrial DNA profiles.

Forensic Sci Int. 2001 Dec 15;124(1):83-91. doi: 10.1016/s0379-0738(01)00573-4.

DNA Commission of the International Society for Forensic Genetics: guidelines for mitochondrial DNA typing.

Int J Legal Med. 2000;113(4):193-6. doi: 10.1007/s004140000149.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SAM：基于字符串的线粒体 DNA 数据库查询序列搜索算法。

SAM: String-based sequence search algorithm for mitochondrial DNA database queries.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献