Department of Statistics and Financial Mathematics, School of Mathematical Sciences, Beijing Normal University, Ministry of Education, Beijing, China.
Math Biosci. 2011 Aug;232(2):96-100. doi: 10.1016/j.mbs.2011.04.007. Epub 2011 May 7.
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.
蛋白质编码区域的鉴定从根本上说是一个统计模式识别问题。判别分析是一种将一组观测值分类到预定义类别的统计技术,它可用于解决此类问题。众所周知,在任何应用领域的几乎每个数据集都存在异常值,如果数据集存在异常值,经典的判别分析方法(包括线性判别分析(LDA)和二次判别分析(QDA))就无法很好地工作。为了克服这一困难,本文采用了稳健的统计方法。我们选择了四个不同的编码字符作为判别变量,并通过稳健判别分析方法给出了令人满意的结果。