Khan Asif M, Hu Yongli, Miotto Olivo, Thevasagayam Natascha M, Sukumaran Rashmi, Abd Raman Hadia Syahirah, Brusic Vladimir, Tan Tin Wee, Thomas August J
Centre for Bioinformatics, School of Data Sciences, Perdana University, Jalan MAEPS Perdana, Serdang, Selangor Darul Ehsan, 43400, Malaysia.
Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD, 21205, USA.
BMC Med Genomics. 2017 Dec 21;10(Suppl 4):78. doi: 10.1186/s12920-017-0301-2.
Viral vaccine target discovery requires understanding the diversity of both the virus and the human immune system. The readily available and rapidly growing pool of viral sequence data in the public domain enable the identification and characterization of immune targets relevant to adaptive immunity. A systematic bioinformatics approach is necessary to facilitate the analysis of such large datasets for selection of potential candidate vaccine targets.
This work describes a computational methodology to achieve this analysis, with data of dengue, West Nile, hepatitis A, HIV-1, and influenza A viruses as examples. Our methodology has been implemented as an analytical pipeline that brings significant advancement to the field of reverse vaccinology, enabling systematic screening of known sequence data in nature for identification of vaccine targets. This includes key steps (i) comprehensive and extensive collection of sequence data of viral proteomes (the virome), (ii) data cleaning, (iii) large-scale sequence alignments, (iv) peptide entropy analysis, (v) intra- and inter-species variation analysis of conserved sequences, including human homology analysis, and (vi) functional and immunological relevance analysis.
These steps are combined into the pipeline ensuring that a more refined process, as compared to a simple evolutionary conservation analysis, will facilitate a better selection of vaccine targets and their prioritization for subsequent experimental validation.
病毒疫苗靶点的发现需要了解病毒和人类免疫系统的多样性。公共领域中现成且快速增长的病毒序列数据库有助于识别和表征与适应性免疫相关的免疫靶点。采用系统的生物信息学方法对于分析如此庞大的数据集以选择潜在的候选疫苗靶点至关重要。
本文以登革热病毒、西尼罗河病毒、甲型肝炎病毒、HIV-1和甲型流感病毒的数据为例,描述了一种实现上述分析的计算方法。我们的方法已作为一种分析流程得以实施,为反向疫苗学领域带来了重大进展,能够系统地筛选自然界中的已知序列数据以识别疫苗靶点。这包括关键步骤:(i)全面广泛地收集病毒蛋白质组(病毒组)的序列数据;(ii)数据清理;(iii)大规模序列比对;(iv)肽熵分析;(v)保守序列的种内和种间变异分析,包括人类同源性分析;(vi)功能和免疫相关性分析。
这些步骤整合到该流程中,确保与简单的进化保守性分析相比,能有更精细的过程,从而有助于更好地选择疫苗靶点并对其进行优先级排序,以便后续进行实验验证。