Gregori Josep, Salicrú Miquel, Domingo Esteban, Sanchez Alex, Esteban Juan I, Rodríguez-Frías Francisco, Quer Josep
Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain.
Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain.
Bioinformatics. 2014 Apr 15;30(8):1104-1111. doi: 10.1093/bioinformatics/btt768. Epub 2014 Jan 2.
Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible.
josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
鉴于病毒准种的内在动态变化,我们常常对患者连续样本的多样性指数比较感兴趣,或者对治疗组与对照组设计中患者群体的病毒多样性指数比较感兴趣。因此,重要的是要确保每个样本的多样性测量能够在无偏差且一致的统计框架内进行比较。在本报告中,我们回顾了一些常用于衡量病毒准种复杂性的指数,并提供了统计推断方法,采用了生态学领域的程序。特别是,我们研究了香农熵和突变频率,并讨论了文献中发现的香农熵不同归一化方法的适用性。通过将扩增子超深度焦磷酸测序(UDPS)原始数据作为真实丙型肝炎病毒群体的替代物,我们通过计算机模拟采样研究了在两种病毒准种采样方法下这些指数的统计特性,这两种方法分别是经典克隆后进行桑格测序(CCSS)和下一代测序(NGS),如UDPS。我们针对两种采样方法(CCSS和NGS)分别提出了具体的解决方案,以确保得出尽可能无偏差的符合统计学的结论。
josep.gregori@gmail.com 补充信息:补充数据可在《生物信息学》在线获取。