IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1714-1720. doi: 10.1109/TCBB.2019.2898943. Epub 2019 Feb 12.
Fusion peptide (FP) is a pivotal domain for the entry of retrovirus into host cells to continue self-replication. The crucial role indicates that FP is a promising drug target for therapeutic intervention. A FP model proposed in our previous work is relatively not efficient to predict FP in retroviruses. Thus in this work, we come up with a new computational model to predict FP domains in all the retroviruses. It basically predicts FP domains through recognizing their start and end sites separately with SVM method combing the hydrophobicity knowledge of the subdomain around furin cleavage site. The classification accuracy rates are 91.91, 91.20 and 89.13 percent respectively corresponding to jack-knife, 10-fold cross-validation and 5-fold cross-validation test. Second, this model discovered 69,753 and 493 putative FPs after scanning amino acid sequences and HERV DNA sequences both without FP annotations. Subsequently, a statistical analysis was performed on the 69,753 putative FP sequences, which confirms that FP is a hydrophobic domain. Lastly, we depicted the distribution of the 493 putative FP sequences on each human chromosome and each HERV family, which shows that FP of HERV probably has chromosome and family preference.
融合肽(FP)是逆转录病毒进入宿主细胞继续自我复制的关键结构域。这一关键作用表明,FP 是治疗干预的一个有前途的药物靶点。我们之前的工作中提出的 FP 模型在预测逆转录病毒中的 FP 时效率相对较低。因此,在这项工作中,我们提出了一种新的计算模型来预测所有逆转录病毒中的 FP 结构域。它基本上通过使用 SVM 方法分别识别 FP 的起始和结束位点,并结合靠近弗林切割位点的亚结构域的疏水性知识来预测 FP 结构域。Jack-knife、10 折交叉验证和 5 折交叉验证测试的分类准确率分别为 91.91%、91.20%和 89.13%。其次,该模型在扫描氨基酸序列和无 FP 注释的 HERV DNA 序列后分别发现了 69753 个和 493 个推定的 FP。随后,对 69753 个推定的 FP 序列进行了统计分析,证实了 FP 是一个疏水性结构域。最后,我们在每个人类染色体和每个 HERV 家族上描绘了 493 个推定的 FP 序列的分布,表明 HERV 的 FP 可能具有染色体和家族偏好。