Ratmann Oliver, Wymant Chris, Colijn Caroline, Danaviah Siva, Essex M, Frost Simon D W, Gall Astrid, Gaiseitsiwe Simani, Grabowski Mary, Gray Ronald, Guindon Stephane, von Haeseler Arndt, Kaleebu Pontiano, Kendall Michelle, Kozlov Alexey, Manasa Justen, Minh Bui Quang, Moyo Sikhulile, Novitsky Vladimir, Nsubuga Rebecca, Pillay Sureshnee, Quinn Thomas C, Serwadda David, Ssemwanga Deogratius, Stamatakis Alexandros, Trifinopoulos Jana, Wawer Maria, Leigh Brown Andrew, de Oliveira Tulio, Kellam Paul, Pillay Deenan, Fraser Christophe
Imperial College London School of Public Health, 156430, Department of Infectious Disease Epidemiology, London, United Kingdom of Great Britain and Northern Ireland ;
Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Oxford, United Kingdom of Great Britain and Northern Ireland ;
AIDS Res Hum Retroviruses. 2017 Nov;33(11):1083-1098. doi: 10.1089/AID.2017.0061. Epub 2017 May 25.
To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the 'Phylogenetics and Networks for Generalised HIV Epidemics in Africa' consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=2,833; MRC/UVRI Uganda, n=701; Mochudi Prevention Project, n=359; Africa Health Research Institute Resistance Cohort, n=92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3' end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
为了描述HIV-1负担最为沉重地区的HIV-1传播动态,“非洲广义HIV流行的系统发育学与网络”联盟(PANGEA-HIV)正在对撒哈拉以南非洲各地的全基因组病毒分离株进行测序。我们报告了来自四个队列研究地点的首批3985条PANGEA-HIV一致序列(拉凯社区队列研究,n = 2833;乌干达MRC/UVRI,n = 701;莫丘迪预防项目,n = 359;非洲卫生研究所耐药性队列,n = 92)。二代测序成功率各不相同:对于来自南非的所有序列,从gag基因到nef基因的超过80%的病毒基因组能够被确定,来自莫丘迪的序列为75%,来自乌干达MRC/UVRI的序列为60%,来自拉凯的序列为22%。部分测序失败主要与低病毒载量有关,对于更靠近基因组3'端的扩增子失败率增加,除了HIV-1 D亚型外与亚型多样性无关,并且在控制其他因素后仍与采样地点显著相关。我们在模拟中评估了PANGEA-HIV序列中缺失数据模式对系统发育重建的影响。我们发现了一个分类单元抽样阈值,低于该阈值时,二代序列中缺失字符的零散分布会对HIV-1系统发育重建的准确性产生过度负面影响,这归因于当病毒树中的分支较长时积累的树重建假象。大量的PANGEA-HIV序列为评估撒哈拉以南非洲的HIV-1传播动态和确定预防机会提供了前所未有的机遇。对这些数据进行分子流行病学分析时必须谨慎,因为序列抽样仍低于确定的阈值,预计缺失字符会对系统发育重建产生相当大的负面影响。