Ebina Teppei, Suzuki Ryosuke, Tsuji Ryotaro, Kuroda Yutaka
Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan,
J Comput Aided Mol Des. 2014 Aug;28(8):831-9. doi: 10.1007/s10822-014-9763-x. Epub 2014 Jun 26.
Domain linker prediction is attracting much interest as it can help identifying novel domains suitable for high throughput proteomics analysis. Here, we report H-DROP, an SVM-based Helical Domain linker pRediction using OPtimal features. H-DROP is, to the best of our knowledge, the first predictor for specifically and effectively identifying helical linkers. This was made possible first because a large training dataset became available from IS-Dom, and second because we selected a small number of optimal features from a huge number of potential ones. The training helical linker dataset, which included 261 helical linkers, was constructed by detecting helical residues at the boundary regions of two independent structural domains listed in our previously reported IS-Dom dataset. 45 optimal feature candidates were selected from 3,000 features by random forest, which were further reduced to 26 optimal features by stepwise selection. The prediction sensitivity and precision of H-DROP were 35.2 and 38.8%, respectively. These values were over 10.7% higher than those of control methods including our previously developed DROP, which is a coil linker predictor, and PPRODO, which is trained with un-differentiated domain boundary sequences. Overall, these results indicated that helical linkers can be predicted from sequence information alone by using a strictly curated training data set for helical linkers and carefully selected set of optimal features. H-DROP is available at http://domserv.lab.tuat.ac.jp.
结构域连接子预测正吸引着众多关注,因为它有助于识别适用于高通量蛋白质组学分析的新型结构域。在此,我们报告了H-DROP,一种基于支持向量机(SVM)的利用最优特征进行螺旋结构域连接子预测的方法。据我们所知,H-DROP是首个专门且有效地识别螺旋连接子的预测工具。这之所以成为可能,首先是因为从IS-Dom获得了大量训练数据集,其次是因为我们从大量潜在特征中挑选出了少量最优特征。训练螺旋连接子数据集包含261个螺旋连接子,是通过在我们先前报道的IS-Dom数据集中列出的两个独立结构域的边界区域检测螺旋残基构建而成。通过随机森林从3000个特征中挑选出45个最优特征候选,再通过逐步选择将其进一步缩减至26个最优特征。H-DROP的预测灵敏度和精度分别为35.2%和38.8%。这些值比包括我们先前开发的DROP(一种卷曲连接子预测工具)和PPRODO(用未分化的结构域边界序列训练)在内的对照方法高出10.7%以上。总体而言,这些结果表明,通过使用经过严格筛选的螺旋连接子训练数据集和精心挑选的最优特征集,仅从序列信息就能预测螺旋连接子。可在http://domserv.lab.tuat.ac.jp获取H-DROP。