Bioinformatics Centre, Institute of Microbial Technology-Council of Scientific and Industrial Research, Chandigarh, India.
PLoS One. 2013 Apr 15;8(4):e61437. doi: 10.1371/journal.pone.0061437. Print 2013.
HIV-1 infects the host cell by interacting with the primary receptor CD4 and a coreceptor CCR5 or CXCR4. Maraviroc, a CCR5 antagonist binds to CCR5 receptor. Thus, it is important to identify the coreceptor used by the HIV strains dominating in the patient. In past, a number of experimental assays and in-silico techniques have been developed for predicting the coreceptor tropism. The prediction accuracy of these methods is excellent when predicting CCR5(R5) tropic sequences but is relatively poor for CXCR4(X4) tropic sequences. Therefore, any new method for accurate determination of coreceptor usage would be of paramount importance to the successful management of HIV-infected individuals.
The dataset used in this study comprised 1799 R5-tropic and 598 X4-tropic third variable (V3) sequences of HIV-1. We compared the amino acid composition of both types of V3 sequences and observed that certain types of residues, e.g., Asparagine and Isoleucine, were preferred in R5-tropic sequences whereas residues like Lysine, Arginine, and Tryptophan were preferred in X4-tropic sequences. Initially, Support Vector Machine-based models were developed using amino acid composition, dipeptide composition, and split amino acid composition, which achieved accuracy up to 90%. We used BLAST to discriminate R5- and X4-tropic sequences and correctly predicted 93.16% of R5- and 75.75% of X4-tropic sequences. In order to improve the prediction accuracy, a Hybrid model was developed that achieved 91.66% sensitivity, 81.77% specificity, 89.19% accuracy and 0.72 Matthews Correlation Coefficient. The performance of our models was also evaluated on an independent dataset (256 R5- and 81 X4-tropic sequences) and achieved maximum accuracy of 84.87% with Matthews Correlation Coefficient 0.63.
This study describes a highly efficient method for predicting HIV-1 coreceptor usage from V3 sequences. In order to provide a service to the scientific community, a webserver HIVcoPred was developed (http://www.imtech.res.in/raghava/hivcopred/) for predicting the coreceptor usage.
HIV-1 通过与主要受体 CD4 和辅助受体 CCR5 或 CXCR4 相互作用感染宿主细胞。马拉维若,一种 CCR5 拮抗剂,与 CCR5 受体结合。因此,识别主导患者的 HIV 株使用的辅助受体非常重要。过去,已经开发了许多实验测定和计算技术来预测辅助受体嗜性。这些方法对于预测 CCR5(R5)嗜性序列的预测准确性非常高,但对于 CXCR4(X4)嗜性序列的预测准确性相对较差。因此,任何用于准确确定辅助受体使用的新方法对于成功管理 HIV 感染个体都将至关重要。
本研究使用的数据集包括 1799 个 R5 嗜性和 598 个 X4 嗜性 HIV-1 第三可变(V3)序列。我们比较了这两种类型的 V3 序列的氨基酸组成,发现某些类型的残基,如天冬酰胺和异亮氨酸,在 R5 嗜性序列中优先,而赖氨酸、精氨酸和色氨酸等残基在 X4 嗜性序列中优先。最初,我们使用氨基酸组成、二肽组成和分裂氨基酸组成开发了基于支持向量机的模型,这些模型的准确性高达 90%。我们使用 BLAST 来区分 R5 和 X4 嗜性序列,并正确预测了 93.16%的 R5 嗜性和 75.75%的 X4 嗜性序列。为了提高预测准确性,我们开发了一种混合模型,该模型实现了 91.66%的灵敏度、81.77%的特异性、89.19%的准确性和 0.72 的 Matthews 相关系数。我们的模型在独立数据集(256 个 R5 和 81 个 X4 嗜性序列)上的性能也进行了评估,实现了 84.87%的最大准确性和 0.63 的 Matthews 相关系数。
本研究描述了一种从 V3 序列预测 HIV-1 辅助受体使用的高效方法。为了向科学界提供服务,我们开发了一个 web 服务器 HIVcoPred(http://www.imtech.res.in/raghava/hivcopred/),用于预测辅助受体的使用。