Suppr超能文献

估计经过整理的基因本体论(GO)数据库序列注释的注释错误率。

Estimating the annotation error rate of curated GO database sequence annotations.

作者信息

Jones Craig E, Brown Alfred L, Baumann Ute

机构信息

School of Computer Science, University of Adelaide, South Australia, Australia.

出版信息

BMC Bioinformatics. 2007 May 22;8:170. doi: 10.1186/1471-2105-8-170.

Abstract

BACKGROUND

Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences.

RESULTS

We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%.

CONCLUSION

While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

摘要

背景

在实验室研究和进行计算推断时,描述序列功能的注释对研究人员极为重要。然而,对于序列功能注释的数据质量却鲜有研究。在此,我们开发了一种估算经过整理的序列注释错误率的新方法,并将其应用于基因本体论(GO)序列数据库(GOSeqLite)。该方法包括以已知速率人为地在序列注释中添加错误,并利用回归分析基于BLAST匹配序列来模拟对注释精度的影响。

结果

我们估算出GOSeqLite数据库(2006年3月)中经过整理的GO序列注释的错误率在28%至30%之间。未使用基于序列相似性方法(非ISS)得出的注释的估算错误率在13%至18%之间。使用序列相似性方法(ISS)得出的注释的估算错误率为49%。

结论

虽然总体错误率相当低,但谨慎对待所有ISS注释是明智的。以ISS注释作为预测基础的电子注释器可能具有较高的错误预测率,因此这些系统的设计者应尽可能考虑避免使用ISS注释。对于使用ISS注释进行预测的电子注释器应持怀疑态度。我们建议注释整理人员在将ISS注释视为有效之前应进行全面审查。总体而言,GO数据库中经过整理的序列注释的用户应确信他们使用的是质量相对较高的信息来源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58fe/1892569/281fbfe7cbcc/1471-2105-8-170-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验