Institute of Legal Medicine, Innsbruck Medical University, Innsbruck, Austria.
Institute of Mathematics, University of Innsbruck, Innsbruck, Austria.
Forensic Sci Int Genet. 2013 Dec;7(6):601-609. doi: 10.1016/j.fsigen.2013.07.005. Epub 2013 Aug 12.
The assignment of haplogroups to mitochondrial DNA haplotypes contributes substantial value for quality control, not only in forensic genetics but also in population and medical genetics. The availability of Phylotree, a widely accepted phylogenetic tree of human mitochondrial DNA lineages, led to the development of several (semi-)automated software solutions for haplogrouping. However, currently existing haplogrouping tools only make use of haplogroup-defining mutations, whereas private mutations (beyond the haplogroup level) can be additionally informative allowing for enhanced haplogroup assignment. This is especially relevant in the case of (partial) control region sequences, which are mainly used in forensics. The present study makes three major contributions toward a more reliable, semi-automated estimation of mitochondrial haplogroups. First, a quality-controlled database consisting of 14,990 full mtGenomes downloaded from GenBank was compiled. Together with Phylotree, these mtGenomes serve as a reference database for haplogroup estimates. Second, the concept of fluctuation rates, i.e. a maximum likelihood estimation of the stability of mutations based on 19,171 full control region haplotypes for which raw lane data is available, is presented. Finally, an algorithm for estimating the haplogroup of an mtDNA sequence based on the combined database of full mtGenomes and Phylotree, which also incorporates the empirically determined fluctuation rates, is brought forward. On the basis of examples from the literature and EMPOP, the algorithm is not only validated, but both the strength of this approach and its utility for quality control of mitochondrial haplotypes is also demonstrated.
单倍群的分配对线粒体 DNA 单倍型的质量控制具有重要意义,不仅在法医学遗传学中,而且在群体和医学遗传学中也是如此。Phylotree 的出现为人类线粒体 DNA 谱系的广泛接受的系统发育树提供了可用性,这导致了几种(半)自动化软件解决方案的发展,用于单倍群分组。然而,目前现有的单倍群分组工具仅利用单倍群定义突变,而私有突变(超出单倍群水平)可以提供额外的信息,从而增强单倍群分配。这在(部分)控制区序列的情况下尤其相关,这些序列主要用于法医学。本研究对更可靠、半自动的线粒体单倍群估计做出了三项主要贡献。首先,编译了一个由从 GenBank 下载的 14990 个完整 mtGenomes 组成的质量控制数据库。与 Phylotree 一起,这些 mtGenomes 作为单倍群估计的参考数据库。其次,提出了波动率的概念,即基于 19171 个完整控制区单倍型的最大似然估计,这些单倍型可提供原始泳道数据。最后,提出了一种基于完整 mtGenomes 和 Phylotree 的组合数据库以及经验确定的波动率来估计 mtDNA 序列的单倍群的算法。根据文献和 EMPOP 的例子,该算法不仅得到了验证,而且还展示了这种方法的优势及其对线粒体单倍型质量控制的实用性。