Marttinen Pekka, Baldwin Adam, Hanage William P, Dowson Chris, Mahenthiralingam Eshwar, Corander Jukka
Department of Mathematics and statistics, University of Helsinki, FIN-00014, Finland.
BMC Bioinformatics. 2008 Oct 7;9:421. doi: 10.1186/1471-2105-9-421.
We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases.
We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites.
A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/mnf//mate/jc/software/brat.html.
我们考虑在代表亲缘关系相当密切的不同种群的细菌的多位点DNA序列中,共同发现重组片段及其起源。随着数据集中菌株数量的增加,目前可用的能够对不确定性进行概率表征的重组检测方法在实际应用中适用性有限。
我们引入了一种贝叶斯空间结构模型,该模型表示观察到的序列中各位点起源的连续性,包括与任何特定位点起源相关的不确定性的概率表征。为了实现一种对代表单个属的大规模数据集进行统计准确且实际可行的分析方法,我们开发了一种新颖的软件工具(BRAT,贝叶斯重组追踪器)来实现该模型和相应的学习算法,该工具能够识别后验最优结构,并估计各位点假定起源的边际后验概率。
通过大量具有挑战性的模拟场景以及对来自伯克霍尔德菌属120个菌株的7个管家基因的真实数据进行分析,来说明我们的方法所提供的可能性。该软件可从网址http://web.abo.fi/fak/mnf//mate/jc/software/brat.html免费下载。