Nagy Alinda, Patthy László
Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1113 Budapest, Hungary.
Database (Oxford). 2014 Apr 4;2014:bau032. doi: 10.1093/database/bau032. Print 2014.
Protein databases are heavily contaminated with erroneous (mispredicted, abnormal and incomplete) sequences and these erroneous data significantly distort the conclusions drawn from genome-scale protein sequence analyses. In our earlier work we described the MisPred resource that serves to identify erroneous sequences; here we present the FixPred computational pipeline that automatically corrects sequences identified by MisPred as erroneous. The current version of the associated FixPred database contains corrected UniProtKB/Swiss-Prot and NCBI/RefSeq sequences from Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Danio rerio, Fugu rubripes, Ciona intestinalis, Branchostoma floridae, Drosophila melanogaster and Caenorhabditis elegans; future releases of the FixPred database will include corrected sequences of additional Metazoan species. The FixPred computational pipeline and database (http://www.fixpred.com) are easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. Database URL: http://www.fixpred.com.
蛋白质数据库中充斥着错误(预测错误、异常和不完整)的序列,这些错误数据严重扭曲了从基因组规模蛋白质序列分析得出的结论。在我们早期的工作中,我们描述了用于识别错误序列的MisPred资源;在此,我们展示了FixPred计算流程,它能自动校正被MisPred识别为错误的序列。相关的FixPred数据库当前版本包含来自智人、小家鼠、褐家鼠、家猪、原鸡、热带爪蟾、斑马鱼、红鳍东方鲀、玻璃海鞘、佛罗里达文昌鱼、黑腹果蝇和秀丽隐杆线虫的经校正的UniProtKB/Swiss-Prot和NCBI/RefSeq序列;FixPred数据库的未来版本将包括更多后生动物物种的校正序列。通过与强大的查询引擎和标准网络服务相结合的简单网络界面,可以轻松访问FixPred计算流程和数据库(http://www.fixpred.com)。内容可以以多种格式完全或部分下载。数据库网址:http://www.fixpred.com。