Liver Diseases-Viral Hepatitis, Liver Unit, Vall d'Hebron Institut de Recerca (VHIR), Vall d'Hebron Hospital Universitari, Vall d'Hebron Barcelona Hospital Campus, Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain.
Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain.
Viruses. 2024 Apr 29;16(5):710. doi: 10.3390/v16050710.
In quasispecies diversity studies, the comparison of two samples of varying sizes is a common necessity. However, the sensitivity of certain diversity indices to sample size variations poses a challenge. To address this issue, rarefaction emerges as a crucial tool, serving to normalize and create fairly comparable samples. This study emphasizes the imperative nature of sample size normalization in quasispecies diversity studies using next-generation sequencing (NGS) data. We present a thorough examination of resampling schemes using various simple hypothetical cases of quasispecies showing different quasispecies structures in the sense of haplotype genomic composition, offering a comprehensive understanding of their implications in general cases. Despite the big numbers implied in this sort of study, often involving coverages exceeding 100,000 reads per sample and amplicon, the rarefaction process for normalization should be performed with repeated resampling without replacement, especially when rare haplotypes constitute a significant fraction of interest. However, it is noteworthy that different diversity indices exhibit distinct sensitivities to sample size. Consequently, some diversity indicators may be compared directly without normalization, or instead may be resampled safely with replacement.
在准种多样性研究中,比较两个不同大小的样本是常见的需求。然而,某些多样性指数对样本大小变化的敏感性是一个挑战。为了解决这个问题,稀疏化成为一个关键工具,用于标准化和创建相当可比的样本。本研究强调了使用下一代测序 (NGS) 数据进行准种多样性研究中样本大小标准化的必要性。我们通过各种简单的准种假设情况展示了不同的准种结构,全面研究了使用不同简单的准种假设情况的重抽样方案,提供了对它们在一般情况下的影响的全面理解。尽管这种研究涉及大量数据,通常涉及每个样本和扩增子的覆盖超过 100000 个读数,但为了标准化,稀疏化过程应该使用重复无替换的重抽样,特别是当稀有单倍型构成重要的一部分时。然而,值得注意的是,不同的多样性指数对样本大小的敏感性不同。因此,一些多样性指标可以直接比较而无需标准化,或者可以使用替换安全地重抽样。