Suppr超能文献

评估蛋白质数据库中确定性基序显著性度量

Evaluating deterministic motif significance measures in protein databases.

作者信息

Ferreira Pedro Gabriel, Azevedo Paulo J

机构信息

Department of Informatics, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.

出版信息

Algorithms Mol Biol. 2007 Dec 24;2:16. doi: 10.1186/1748-7188-2-16.

Abstract

BACKGROUND

Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations.

RESULTS

From the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs.

CONCLUSION

In this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases.

摘要

背景

评估基序挖掘算法的结果是一项重要任务,因为报告的基序数量可能非常庞大。显著性度量在自动对这些基序进行排名中起着核心作用,从而减轻分析工作。找出最有趣和相关的基序取决于正确度量的选择。多种度量的联合使用可能会提供更可靠的结果。然而,必须谨慎以避免虚假评估。

结果

从所进行的实验集可以验证,一些选定的显著性度量在广泛的情况下表现出非常相似的行为,因此提供了冗余信息。一些度量已被证明更适合对高度保守的基序进行排名,而其他度量则更适合对弱保守的基序进行排名。支持度似乎是正确进行基序排名时要考虑的一个非常重要的特征。我们观察到并非所有度量都适用于类信息平衡较差的情况,例如,当正数据明显少于负数据时。最后,提出了一种可视化方案,当应用多种度量时,能够轻松识别高分基序。

结论

在这项工作中,我们对14种用于模式评估的显著性度量进行了调查和分类。评估了它们对三种类型确定性基序进行排名的能力。在不同的测试条件下应用这些度量,并确定了其中的关系。本研究为从蛋白质数据库中提取的确定性基序评估选择合适的显著性度量集提供了一些相关见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/2254621/195863382db2/1748-7188-2-16-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验