School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China.
IEEE Trans Nanobioscience. 2011 Jun;10(2):121-9. doi: 10.1109/TNB.2011.2160730. Epub 2011 Jul 7.
Transmembrane helices (TMH) identification is one of the most important steps in membrane protein structure prediction. Existing TMH predictors tend to pursue accurate computational models without carefully considering the interpretability of these models and thus act as a black box. In this paper, a novel TMH predictor called SOMRuler with excellent interpretability while possessing high prediction accuracy is presented. The SOMRuler uses a self-organizing map (SOM) to learn helices distribution knowledge, which is encoded in the codebook vectors of the trained SOM, from the training samples. Human interpretable fuzzy rules are then extracted from the codebook vectors of the trained SOM. By extracting fuzzy rules from the learned knowledge rather than the original training samples, on the one hand, the computational burden of extracting fuzzy rules can be greatly reduced; on the other hand, the reliability of the extracted rules can also be enhanced since noise contained in the original samples can be smoothened by the learning procedure of SOM. The validity of the fuzzy rules extracted by SOMRuler is qualitatively and quantitatively analyzed. Experimental results on the benchmark dataset show that the SOMRuler outperforms most existing popular TMH predictors and is flexible to suite for a wide variety of problems in bioinformatics. The SOMRuler software is implemented by Java and Matlab and is available for academic use at: http://www.csbio.sjtu.edu.cn/bioinf/SOMRuler/.
跨膜螺旋(TMH)的识别是膜蛋白结构预测中最重要的步骤之一。现有的 TMH 预测器往往追求准确的计算模型,而不仔细考虑这些模型的可解释性,因此充当一个黑盒子。在本文中,提出了一种新颖的 TMH 预测器,称为 SOMRuler,它具有出色的可解释性,同时具有很高的预测准确性。SOMRuler 使用自组织映射(SOM)从训练样本中学习螺旋分布知识,这些知识编码在训练的 SOM 的代码本向量中。然后从训练的 SOM 的代码本向量中提取出人类可解释的模糊规则。通过从学习到的知识中提取模糊规则,而不是从原始训练样本中提取,一方面可以大大降低提取模糊规则的计算负担;另一方面,由于原始样本中包含的噪声可以通过 SOM 的学习过程平滑化,因此可以增强提取规则的可靠性。定性和定量地分析了 SOMRuler 提取的模糊规则的有效性。在基准数据集上的实验结果表明,SOMRuler 优于大多数现有的流行的 TMH 预测器,并且灵活适用于生物信息学中的各种问题。SOMRuler 软件是用 Java 和 Matlab 实现的,可以在学术上使用:http://www.csbio.sjtu.edu.cn/bioinf/SOMRuler/。