Suppr超能文献

cath-resolve-hits:一个快速解决可疑域名匹配的新工具。

cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly.

机构信息

Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK.

Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, Oxfordshire, UK.

出版信息

Bioinformatics. 2019 May 15;35(10):1766-1767. doi: 10.1093/bioinformatics/bty863.

Abstract

MOTIVATION

Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory.

RESULTS

We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ∼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets.

AVAILABILITY AND IMPLEMENTATION

CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

许多生物信息学领域都要求我们将域匹配分配到查询蛋白质的片段上。从一组候选匹配开始,我们希望确定最佳子集,使匹配之间的重叠有限/无。这可能会因输入数据中的不连续域而变得更加复杂。现有的工具越来越面临着非常大的数据集,而这些数据集需要大量的 CPU 时间和内存。

结果

我们提出了 cath-resolve-hits(CRH),这是一种新工具,它使用开源 C++中的动态编程算法来快速处理大数据集(高达约 100 万次命中/秒),并使用合理数量的内存。它接受多种输入格式,并以纯文本、JSON 或图形 HTML 提供输出。我们描述了一个与现有算法的基准比较,结果表明 CRH 提供了非常相似或略有改进的结果,并且在大数据集上的 CPU/内存性能有了很大的提高。

可用性和实现

CRH 可在 https://github.com/UCLOrengoGroup/cath-tools 上获得;文档可在 http://cath-tools.readthedocs.io 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d450/6513158/3ccdd8156bfa/bty863f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验