Upadhyay Atul Kumar, Sowdhamini Ramanathan
National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore 560 065, India.
PLoS One. 2016 Jul 28;11(7):e0159627. doi: 10.1371/journal.pone.0159627. eCollection 2016.
3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
3D结构域交换是蛋白质寡聚化的机制之一,表现出这种现象的蛋白质具有多种生物学功能。这些经历结构域交换的蛋白质因其与人类疾病(如构象疾病、淀粉样变性、丝氨酸蛋白酶抑制剂病、蛋白质病等)有关而备受关注。尽早识别出全人类基因组中仍具有结构域交换倾向的蛋白质将有助于疾病防控管理的多个方面。通过使用机器学习方法开发了预测模型,用于预测蛋白质序列中可能的结构域交换,平均准确率为78%(灵敏度为85.6%,特异性为87.5%,MCC值为0.72)。这些模型被应用于许多完整基因组,尤其着重于人类基因组。人类基因组中近44%的蛋白质序列被预测为结构域交换阳性。基于基因本体论(GO),对人类基因组中预测为阳性的序列进行了富集分析,以分析其结构域分布、疾病关联和功能重要性。还进行了富集分析,以更好地理解这些序列的功能重要性。最后,我们利用氨基酸的重要物理化学性质,在给定的假定结构域交换序列中开发了铰链区预测方法。