Laydon Daniel J, Melamed Anat, Sim Aaron, Gillet Nicolas A, Sim Kathleen, Darko Sam, Kroll J Simon, Douek Daniel C, Price David A, Bangham Charles R M, Asquith Becca
Section of Immunology, Wright-Fleming Institute, Imperial College School of Medicine, London, United Kingdom.
Centre for Integrative Systems Biology and Bioinformatics, South Kensington Campus, Imperial College, London, United Kingdom.
PLoS Comput Biol. 2014 Jun 19;10(6):e1003646. doi: 10.1371/journal.pcbi.1003646. eCollection 2014 Jun.
Estimation of immunological and microbiological diversity is vital to our understanding of infection and the immune response. For instance, what is the diversity of the T cell repertoire? These questions are partially addressed by high-throughput sequencing techniques that enable identification of immunological and microbiological "species" in a sample. Estimators of the number of unseen species are needed to estimate population diversity from sample diversity. Here we test five widely used non-parametric estimators, and develop and validate a novel method, DivE, to estimate species richness and distribution. We used three independent datasets: (i) viral populations from subjects infected with human T-lymphotropic virus type 1; (ii) T cell antigen receptor clonotype repertoires; and (iii) microbial data from infant faecal samples. When applied to datasets with rarefaction curves that did not plateau, existing estimators systematically increased with sample size. In contrast, DivE consistently and accurately estimated diversity for all datasets. We identify conditions that limit the application of DivE. We also show that DivE can be used to accurately estimate the underlying population frequency distribution. We have developed a novel method that is significantly more accurate than commonly used biodiversity estimators in microbiological and immunological populations.
免疫和微生物多样性的评估对于我们理解感染及免疫反应至关重要。例如,T细胞库的多样性如何?高通量测序技术部分解决了这些问题,该技术能够识别样本中的免疫和微生物“物种”。需要通过未观察到的物种数量估计器,从样本多样性来估计群体多样性。在此,我们测试了五种广泛使用的非参数估计器,并开发和验证了一种新方法DivE,用于估计物种丰富度和分布。我们使用了三个独立数据集:(i) 感染1型人类嗜T淋巴细胞病毒的受试者的病毒群体;(ii) T细胞抗原受体克隆型库;以及(iii) 婴儿粪便样本的微生物数据。当应用于稀释曲线未趋于平稳的数据集时,现有估计器会随着样本量的增加而系统性增加。相比之下,DivE对所有数据集都能持续且准确地估计多样性。我们确定了限制DivE应用的条件。我们还表明,DivE可用于准确估计潜在的群体频率分布。我们开发了一种新方法,在微生物和免疫群体中,该方法比常用的生物多样性估计器要准确得多。