通过K类别相关系数比较两种K类别分配。

Comparing two K-category assignments by a K-category correlation coefficient.

作者信息

Gorodkin J

机构信息

Center for Bioinformatics and Division of Genetics, IBHV, The Royal Veterinary and Agricultural University, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.

出版信息

Comput Biol Chem. 2004 Dec;28(5-6):367-74. doi: 10.1016/j.compbiolchem.2004.09.006.

DOI:10.1016/j.compbiolchem.2004.09.006

PMID:15556477

Abstract

Predicted assignments of biological sequences are often evaluated by Matthews correlation coefficient. However, Matthews correlation coefficient applies only to cases where the assignments belong to two categories, and cases with more than two categories are often artificially forced into two categories by considering what belongs and what does not belong to one of the categories, leading to the loss of information. Here, an extended correlation coefficient that applies to K-categories is proposed, and this measure is shown to be highly applicable for evaluating prediction of RNA secondary structure in cases where some predicted pairs go into the category "unknown" due to lack of reliability in predicted pairs or unpaired residues. Hence, predicting base pairs of RNA secondary structure can be a three-category problem. The measure is further shown to be well in agreement with existing performance measures used for ranking protein secondary structure predictions. Server and software is available at http://rk.kvl.dk/.

摘要

生物序列的预测分配通常通过马修斯相关系数进行评估。然而，马修斯相关系数仅适用于分配属于两类的情况，而对于多于两类的情况，通常会通过考虑属于某一类和不属于某一类的情况，人为地将其强制分为两类，从而导致信息丢失。在此，提出了一种适用于K类的扩展相关系数，并且该度量被证明在评估RNA二级结构预测时非常适用，在这种情况下，由于预测碱基对或未配对残基缺乏可靠性，一些预测的碱基对会归入“未知”类别。因此，预测RNA二级结构的碱基对可能是一个三类问题。该度量进一步被证明与用于对蛋白质二级结构预测进行排名的现有性能度量非常一致。可在http://rk.kvl.dk/获取服务器和软件。