Wróbel Borys, Torres-Puente Manuela, Jiménez Nuria, Bracho María Alma, García-Robles Inmaculada, Moya Andrés, González-Candelas Fernando
Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Valencia, Spain.
Mol Biol Evol. 2006 Jun;23(6):1242-53. doi: 10.1093/molbev/msk012. Epub 2006 Apr 3.
The assumption of a molecular clock for dating events from sequence information is often frustrated by the presence of heterogeneity among evolutionary rates due, among other factors, to positively selected sites. In this work, our goal is to explore methods to estimate infection dates from sequence analysis. One such method, based on site stripping for clock detection, was proposed to unravel the clocklike molecular evolution in sequences showing high variability of evolutionary rates and in the presence of positive selection. Other alternatives imply accommodating heterogeneity in evolutionary rates at various levels, without eliminating any information from the data. Here we present the analysis of a data set of hepatitis C virus (HCV) sequences from 24 patients infected by a single individual with known dates of infection. We first used a simple criterion of relative substitution rate for site removal prior to a regression analysis. Time was regressed on maximum likelihood pairwise evolutionary distances between the sequences sampled from the source individual and infected patients. We show that it is indeed the fastest evolving sites that disturb the molecular clock and that these sites correspond to positively selected codons. The high computational efficiency of the regression analysis allowed us to compare the site-stripping scheme with random removal of sites. We demonstrate that removing the fast-evolving sites significantly increases the accuracy of estimation of infection times based on a single substitution rate. However, the time-of-infection estimations improved substantially when a more sophisticated and computationally demanding Bayesian method was used. This method was used with the same data set but keeping all the sequence positions in the analysis. Consequently, despite the distortion introduced by positive selection on evolutionary rates, it is possible to obtain quite accurate estimates of infection dates, a result of especial relevance for molecular epidemiology studies.
基于序列信息来确定事件发生时间的分子钟假设,常常因进化速率存在异质性(除其他因素外,还包括正选择位点)而受挫。在这项研究中,我们的目标是探索从序列分析中估计感染时间的方法。其中一种基于位点去除以检测分子钟的方法,被提出来用于揭示在进化速率高度可变且存在正选择的序列中类似分子钟的进化情况。其他方法则意味着在不同层面上考虑进化速率的异质性,而不删除数据中的任何信息。在此,我们展示了对来自24名受单一已知感染日期个体感染的丙型肝炎病毒(HCV)序列数据集的分析。我们首先在回归分析之前,使用了一个简单的相对替换率标准来去除位点。将时间与从源个体和感染患者中采样的序列之间的最大似然成对进化距离进行回归分析。我们发现,确实是进化最快的位点扰乱了分子钟,并且这些位点对应于正选择密码子。回归分析的高计算效率使我们能够将位点去除方案与随机去除位点进行比较。我们证明,去除快速进化的位点显著提高了基于单一替换率估计感染时间的准确性。然而,当使用一种更复杂且计算要求更高的贝叶斯方法时,感染时间的估计有了实质性改善。该方法用于相同的数据集,但在分析中保留了所有序列位置。因此,尽管正选择对进化速率造成了扭曲,但仍有可能获得相当准确的感染日期估计,这一结果对分子流行病学研究具有特别重要的意义。