Laboratory of Cellular Biophysics, The Rockefeller University, New York, NY, United States.
Laboratorio de Bioinformática, Biología Molecular y Desarrollos Tecnológicos, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru.
Front Immunol. 2020 Sep 3;11:2008. doi: 10.3389/fimmu.2020.02008. eCollection 2020.
Coronavirus disease (COVID-19), caused by the virus SARS-CoV-2, is already responsible for more than 4.3 million confirmed cases and 295,000 deaths worldwide as of May 15, 2020. Ongoing efforts to control the pandemic include the development of peptide-based vaccines and diagnostic tests. In these approaches, HLA allelic diversity plays a crucial role. Despite its importance, current knowledge of HLA allele frequencies in South America is very limited. In this study, we have performed a literature review of datasets reporting HLA frequencies of South American populations, available in scientific literature and/or in the Allele Frequency Net Database. This allowed us to enrich the current scenario with more than 12.8 million data points. As a result, we are presenting updated HLA allelic frequencies based on country, including 91 alleles that were previously thought to have frequencies either under 5% or of an unknown value. Using alleles with an updated frequency of at least ≥5% in any South American country, we predicted epitopes in SARS-CoV-2 proteins using NetMHCpan (I and II) and MHC flurry. Then, the best predicted epitopes (class-I and -II) were selected based on their binding to South American alleles (Coverage Score). Class II predicted epitopes were also filtered based on their three-dimensional exposure. We obtained 14 class-I and four class-II candidate epitopes with experimental evidence (reported in the Immune Epitope Database and Analysis Resource), having good coverage scores for South America. Additionally, we are presenting 13 HLA-I and 30 HLA-II novel candidate epitopes without experimental evidence, including 16 class-II candidates in highly exposed conserved areas of the NTD and RBD regions of the Spike protein. These novel candidates have even better coverage scores for South America than those with experimental evidence. Finally, we show that recent similar studies presenting candidate epitopes also predicted some of our candidates but discarded them in the selection process, resulting in candidates with suboptimal coverage for South America. In conclusion, the candidate epitopes presented provide valuable information for the development of epitope-based strategies against SARS-CoV-2, such as peptide vaccines and diagnostic tests. Additionally, the updated HLA allelic frequencies provide a better representation of South America and may impact different immunogenetic studies.
截至 2020 年 5 月 15 日,由病毒 SARS-CoV-2 引起的冠状病毒病(COVID-19)已在全球范围内导致超过 430 万例确诊病例和 29.5 万人死亡。正在进行的控制大流行的努力包括开发基于肽的疫苗和诊断测试。在这些方法中,HLA 等位基因多样性起着至关重要的作用。尽管其重要性,但目前对南美的 HLA 等位基因频率的了解非常有限。在这项研究中,我们对文献中报道的南美人种 HLA 频率数据集进行了文献回顾,这些数据集可在科学文献中和/或在 Allele Frequency Net Database 中获得。这使我们能够用超过 1280 万个数据点丰富当前的情况。因此,我们根据国家/地区呈现更新的 HLA 等位基因频率,包括 91 个以前认为频率低于 5%或未知的等位基因。使用在任何南美人种中的频率至少≥5%的等位基因,我们使用 NetMHCpan(I 和 II)和 MHC flurry 预测 SARS-CoV-2 蛋白中的表位。然后,根据与南美人种的结合情况(覆盖评分)选择最佳预测表位(I 类和 II 类)。根据它们的三维暴露情况,还对 II 类预测表位进行了过滤。我们获得了 14 个 I 类和 4 个 II 类候选表位,这些表位具有实验证据(在免疫表位数据库和分析资源中报告),并且对南美的覆盖评分良好。此外,我们还提供了 13 个 HLA-I 和 30 个 HLA-II 新的候选表位,这些表位没有实验证据,包括 Spike 蛋白 NTD 和 RBD 区域高度暴露的保守区域中的 16 个 II 类候选表位。这些新的候选表位对南美的覆盖评分甚至比有实验证据的更好。最后,我们表明,最近发表的提出候选表位的类似研究也预测了我们的一些候选表位,但在选择过程中丢弃了它们,导致候选表位对南美的覆盖评分不理想。总之,提出的候选表位为针对 SARS-CoV-2 的基于表位的策略(如肽疫苗和诊断测试)的开发提供了有价值的信息。此外,更新的 HLA 等位基因频率更好地代表了南美洲,可能会影响不同的免疫遗传学研究。