Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia.
Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France.
Bioinformatics. 2020 May 1;36(10):3260-3262. doi: 10.1093/bioinformatics/btaa121.
Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs.
Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%.
Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request.
Supplementary data are available at Bioinformatics online.
含有串联重复(TR)的蛋白质丰富,经常折叠成长形非球形结构,并发挥重要功能。已经开发了许多计算工具来检测蛋白质序列中的 TR。不完美的 TR 基序和非重复序列之间的模糊边界使得有必要验证检测到的 TR。
Tally-2.0 是一种基于机器学习(ML)方法的评分工具,可用于验证 TR 检测的结果。它通过使用改进的训练数据集和其他 ML 功能进行了升级。Tally-2.0 的灵敏度为 93%,特异性为 83%,接受者操作特征曲线下的面积为 95%。
Tally-2.0 作为一个网络工具和一个独立的应用程序,根据 Apache License 2.0 发布,可在 URL https://bioinfo.crbm.cnrs.fr/index.php?route=tools&tool=27 上获得。它支持 Linux。源代码可根据要求提供。
补充数据可在 Bioinformatics 在线获得。