Bandelt H J, Macaulay V, Richards M
Fachbereich Mathematik, Universität Hamburg, Bundesstrasse 55, Hamburg, D-20146, Germany.
Mol Phylogenet Evol. 2000 Jul;16(1):8-28. doi: 10.1006/mpev.2000.0792.
Molecular data sets characterized by few phylogenetically informative characters with a broad spectrum of mutation rates, such as intraspecific control-region sequence variation of human mitochondrial DNA (mtDNA), can be usefully visualized in the form of median networks. Here we provide a step-by-step guide to the construction of such networks by hand. We improve upon a previously implemented algorithm by outlining an efficient parametrized strategy amenable to large data sets, greedy reduction, which makes it possible to reconstruct some of the confounding recurrent mutations. This entails some postprocessing as well, which assists in capturing more parsimonious solutions. To simplify the creation of the resulting network by hand, we describe a speedy approach to network construction, based on a careful planning of the processing order. A coalescent simulation tailored to human mtDNA variation in Eurasia testifies to the usefulness of reduced median networks, while highlighting notorious problems faced by all phylogenetic methods in this context. Finally, we discuss two case studies involving the comparison of characters in the two hypervariable segments of the human mtDNA control region in the light of the worldwide control-region sequence database, as well as additional restriction fragment length polymorphism information. We conclude that only a minority of the mutations that hit the second segment occur at sites that would have a mutation rate comparable to those at most sites in the first segment. Discarding the known "noisy" sites of the second segment enhances the analysis.
以少量系统发育信息特征和广泛突变率为特征的分子数据集,例如人类线粒体DNA(mtDNA)的种内控制区序列变异,可以通过中位数网络的形式有效地可视化。在此,我们提供一份手动构建此类网络的分步指南。我们通过概述一种适用于大数据集的高效参数化策略——贪婪简约法,改进了之前实施的算法,该方法使得重建一些混淆的重复突变成为可能。这也需要一些后处理,有助于获取更简约的解决方案。为了简化手动创建最终网络的过程,我们基于对处理顺序的精心规划,描述了一种快速的网络构建方法。针对欧亚大陆人类mtDNA变异的合并模拟证明了简约中位数网络的有用性,同时突出了在此背景下所有系统发育方法面临的显著问题。最后,我们根据全球控制区序列数据库以及额外的限制性片段长度多态性信息,讨论了两个案例研究,涉及人类mtDNA控制区两个高变区特征的比较。我们得出结论,只有少数影响第二段的突变发生在与第一段大多数位点突变率相当的位点。舍弃第二段已知的“噪声”位点可增强分析效果。