Gopal Kreshna, Romo Tod D, Sacchettini James C, Ioerger Thomas R
Department of Computer Science, Texas A&M University, USA.
Proc IEEE Comput Syst Bioinform Conf. 2004:255-65. doi: 10.1109/csb.2004.1332439.
Feature selection and weighting are central problems in pattern recognition and instance-based learning. In this work, we discuss the challenges of constructing and weighting features to recognize 3D patterns of electron density to determine protein structures. We present SLIDER, a feature-weighting algorithm that adjusts weights iteratively such that patterns that match query instances are better ranked than mismatching ones. Moreover, SLIDER makes judicious choices of weight values to be considered in each iteration, by examining specific weights at which matching and mismatching patterns switch as nearest neighbors to query instances. This approach reduces the space of weight vectors to be searched. We make the following two main observations: (1) SLIDER efficiently generates weights that contribute significantly in the retrieval of matching electron density patterns; (2) the optimum weight vector is sensitive to the distance metric i.e. feature relevance can be, to a certain extent, sensitive to the underlying metric used to compare patterns.
特征选择和加权是模式识别和基于实例学习中的核心问题。在这项工作中,我们讨论了构建特征和为其加权以识别电子密度的3D模式来确定蛋白质结构所面临的挑战。我们提出了SLIDER,一种特征加权算法,它通过迭代调整权重,使得与查询实例匹配的模式比不匹配的模式具有更高的排名。此外,SLIDER通过检查匹配和不匹配模式作为查询实例的最近邻切换时的特定权重,在每次迭代中明智地选择要考虑的权重值。这种方法减少了要搜索的权重向量空间。我们有以下两个主要发现:(1)SLIDER有效地生成了对匹配电子密度模式的检索有显著贡献的权重;(2)最优权重向量对距离度量敏感,即特征相关性在一定程度上可能对用于比较模式的基础度量敏感。