School of Electrical Engineering, Korea University, Seoul 136-713, Republic of Korea.
Biochem Biophys Res Commun. 2010 Sep 17;400(2):219-24. doi: 10.1016/j.bbrc.2010.08.042. Epub 2010 Aug 16.
More and more protein structures are being discovered, but most of these still have little functional information. Based on the assumption that structural resemblance would lead to functional similarity, researchers computationally compare a new structure with functionally annotated structures, for high-throughput function prediction. The effectiveness of this approach depends critically upon the quality of comparison. In particular, robust classification often becomes difficult when a function class is an aggregate of multiple subclasses, as is the case with protein annotations. For such multiple-subclass classification problems, an optimal method termed the maximin correlation analysis (MCA) was proposed. However, MCA has never been applied to automated protein function prediction although MCA can minimize the misclassification risk in the correlation-based nearest neighbor classification, thus increasing classification accuracy. In this article, we apply MCA to classifying three-dimensional protein local environment data derived from a subset of the protein data bank (PDB). In our framework, the MCA-based classifier outperformed the compared alternatives by 7-19% and 6-27% in terms of average sensitivity and specificity, respectively. Given that correlation-based similarity measures have been widely used for mining protein data, we expect that MCA would be employed to enhance other types of automated function prediction methods.
越来越多的蛋白质结构被发现,但其中大多数仍然缺乏功能信息。基于结构相似性会导致功能相似性的假设,研究人员通过计算将新结构与具有功能注释的结构进行比较,以实现高通量功能预测。这种方法的有效性取决于比较的质量。特别是,当功能类别是多个子类的组合时,例如蛋白质注释的情况,稳健的分类通常变得困难。对于这种多子类分类问题,提出了一种称为最大最小相关分析(MCA)的最优方法。然而,尽管 MCA 可以最小化基于相关性的最近邻分类中的分类错误风险,从而提高分类准确性,但它从未应用于自动化蛋白质功能预测。在本文中,我们将 MCA 应用于从蛋白质数据库(PDB)子集派生的三维蛋白质局部环境数据的分类。在我们的框架中,基于 MCA 的分类器在平均灵敏度和特异性方面分别优于比较的替代方法 7-19%和 6-27%。鉴于基于相关性的相似性度量已被广泛用于挖掘蛋白质数据,我们预计 MCA 将被用于增强其他类型的自动化功能预测方法。