Suppr超能文献

Tally-2.0:蛋白质序列中串联重复检测的升级验证器。

Tally-2.0: upgraded validator of tandem repeat detection in protein sequences.

机构信息

Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia.

Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France.

出版信息

Bioinformatics. 2020 May 1;36(10):3260-3262. doi: 10.1093/bioinformatics/btaa121.

Abstract

MOTIVATION

Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs.

RESULTS

Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%.

AVAILABILITY AND IMPLEMENTATION

Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

含有串联重复(TR)的蛋白质丰富,经常折叠成长形非球形结构,并发挥重要功能。已经开发了许多计算工具来检测蛋白质序列中的 TR。不完美的 TR 基序和非重复序列之间的模糊边界使得有必要验证检测到的 TR。

结果

Tally-2.0 是一种基于机器学习(ML)方法的评分工具,可用于验证 TR 检测的结果。它通过使用改进的训练数据集和其他 ML 功能进行了升级。Tally-2.0 的灵敏度为 93%,特异性为 83%,接受者操作特征曲线下的面积为 95%。

可用性和实现

Tally-2.0 作为一个网络工具和一个独立的应用程序,根据 Apache License 2.0 发布,可在 URL https://bioinfo.crbm.cnrs.fr/index.php?route=tools&tool=27 上获得。它支持 Linux。源代码可根据要求提供。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

5
Dot2dot: accurate whole-genome tandem repeats discovery.Dot2dot:准确的全基因组串联重复发现。
Bioinformatics. 2019 Mar 15;35(6):914-922. doi: 10.1093/bioinformatics/bty747.

引用本文的文献

本文引用的文献

3
Tandem Repeats in Proteins: Prediction Algorithms and Biological Role.蛋白质串联重复:预测算法与生物学作用。
Front Bioeng Biotechnol. 2015 Sep 24;3:143. doi: 10.3389/fbioe.2015.00143. eCollection 2015.
4
Tandem repeats in proteins: from sequence to structure.蛋白质中的串联重复:从序列到结构。
J Struct Biol. 2012 Sep;179(3):279-88. doi: 10.1016/j.jsb.2011.08.009. Epub 2011 Aug 24.
7
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.
9
Tracking repeats using significance and transitivity.利用显著性和传递性追踪重复序列。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i311-7. doi: 10.1093/bioinformatics/bth911.
10
The leucine-rich repeat as a protein recognition motif.富含亮氨酸的重复序列作为一种蛋白质识别基序。
Curr Opin Struct Biol. 2001 Dec;11(6):725-32. doi: 10.1016/s0959-440x(01)00266-4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验