• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种快速且自动化的解决方案,可准确解析蛋白质结构域架构。

A fast and automated solution for accurately resolving protein domain architectures.

机构信息

Department of Structural and Molecular Biology, UCL, London WC1E 6BT, UK.

出版信息

Bioinformatics. 2010 Mar 15;26(6):745-51. doi: 10.1093/bioinformatics/btq034. Epub 2010 Jan 29.

DOI:10.1093/bioinformatics/btq034
PMID:20118117
Abstract

MOTIVATION

Accurate prediction of the domain content and arrangement in multi-domain proteins (which make up >65% of the large-scale protein databases) provides a valuable tool for function prediction, comparative genomics and studies of molecular evolution. However, scanning a multi-domain protein against a database of domain sequence profiles can often produce conflicting and overlapping matches. We have developed a novel method that employs heaviest weighted clique-finding (HCF), which we show significantly outperforms standard published approaches based on successively assigning the best non-overlapping match (Best Match Cascade, BMC).

RESULTS

We created benchmark data set of structural domain assignments in the CATH database and a corresponding set of Hidden Markov Model-based domain predictions. Using these, we demonstrate that by considering all possible combinations of matches using the HCF approach, we achieve much higher prediction accuracy than the standard BMC method. We also show that it is essential to allow overlapping domain matches to a query in order to identify correct domain assignments. Furthermore, we introduce a straightforward and effective protocol for resolving any overlapping assignments, and producing a single set of non-overlapping predicted domains.

AVAILABILITY AND IMPLEMENTATION

The new approach will be used to determine MDAs for UniProt and Ensembl, and made available via the Gene3D website: http://gene3d.biochem.ucl.ac.uk/Gene3D/. The software has been implemented in C++ and compiled for Linux: source code and binaries can be found at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/

CONTACT

yeats@biochem.ucl.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

准确预测多结构域蛋白(占大规模蛋白质数据库的>65%)的结构域内容和排列,为功能预测、比较基因组学和分子进化研究提供了有价值的工具。然而,将多结构域蛋白与结构域序列轮廓数据库进行扫描,往往会产生冲突和重叠的匹配。我们开发了一种新的方法,该方法采用最重加权团发现(HCF),我们的实验表明,与基于依次分配最佳非重叠匹配(最佳匹配级联,BMC)的标准发布方法相比,该方法具有显著优势。

结果

我们创建了 CATH 数据库中结构域分配的基准数据集,以及相应的基于隐马尔可夫模型的域预测数据集。使用这些数据集,我们证明通过使用 HCF 方法考虑所有可能的匹配组合,我们可以实现比标准 BMC 方法更高的预测精度。我们还表明,为了正确识别域分配,必须允许查询中的域匹配重叠。此外,我们引入了一种简单有效的协议,用于解决任何重叠的分配,并生成一组非重叠的预测域。

可用性和实施

新方法将用于确定 UniProt 和 Ensembl 的 MDAs,并通过 Gene3D 网站提供:http://gene3d.biochem.ucl.ac.uk/Gene3D/。该软件已用 C++实现并为 Linux 编译:源代码和二进制文件可在以下位置找到:ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/

联系人

yeats@biochem.ucl.ac.uk

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
A fast and automated solution for accurately resolving protein domain architectures.一种快速且自动化的解决方案,可准确解析蛋白质结构域架构。
Bioinformatics. 2010 Mar 15;26(6):745-51. doi: 10.1093/bioinformatics/btq034. Epub 2010 Jan 29.
2
The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.CATH结构域数据库以及相关资源Gene3D和DHS为基因组分析提供了全面的结构域家族信息。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D247-51. doi: 10.1093/nar/gki024.
3
Gene3D: merging structure and function for a Thousand genomes.Gene3D:整合结构与功能的千基因组。
Nucleic Acids Res. 2010 Jan;38(Database issue):D296-300. doi: 10.1093/nar/gkp987. Epub 2009 Nov 11.
4
Identification and distribution of protein families in 120 completed genomes using Gene3D.利用Gene3D在120个已完成测序的基因组中鉴定蛋白质家族并分析其分布情况。
Proteins. 2005 May 15;59(3):603-15. doi: 10.1002/prot.20409.
5
A multi-objective optimization approach accurately resolves protein domain architectures.一种多目标优化方法能准确解析蛋白质结构域架构。
Bioinformatics. 2016 Feb 1;32(3):345-53. doi: 10.1093/bioinformatics/btv582. Epub 2015 Oct 12.
6
Gene3D: structural assignments for the biologist and bioinformaticist alike.基因3D:为生物学家和生物信息学家提供的结构分类
Nucleic Acids Res. 2003 Jan 1;31(1):469-73. doi: 10.1093/nar/gkg051.
7
Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis.Gene3D:一个基于结构域的资源,用于比较基因组学、功能注释和蛋白质网络分析。
Nucleic Acids Res. 2012 Jan;40(Database issue):D465-71. doi: 10.1093/nar/gkr1181. Epub 2011 Dec 1.
8
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
9
Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis.Gene3D:用于蛋白质序列和比较基因组分析的多功能域注释。
Nucleic Acids Res. 2014 Jan;42(Database issue):D240-5. doi: 10.1093/nar/gkt1205. Epub 2013 Nov 21.
10
Gene3D: modelling protein structure, function and evolution.基因3D:蛋白质结构、功能及进化建模
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D281-4. doi: 10.1093/nar/gkj057.

引用本文的文献

1
FAS: assessing the similarity between proteins using multi-layered feature architectures.FAS:使用多层特征架构评估蛋白质之间的相似性。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad226.
2
CusProSe: a customizable protein annotation software with an application to the prediction of fungal secondary metabolism genes.CusProSe:一种可定制的蛋白质注释软件,应用于预测真菌次生代谢基因。
Sci Rep. 2023 Jan 25;13(1):1417. doi: 10.1038/s41598-023-27813-y.
3
Mantis: flexible and consensus-driven genome annotation.螳螂:灵活且基于共识的基因组注释。
Gigascience. 2021 Jun 2;10(6). doi: 10.1093/gigascience/giab042.
4
Biological impact of mutually exclusive exon switching.互斥外显子转换的生物学影响。
PLoS Comput Biol. 2021 Mar 2;17(3):e1008708. doi: 10.1371/journal.pcbi.1008708. eCollection 2021 Mar.
5
Automated structure prediction of trans-acyltransferase polyketide synthase products.酰基转移酶聚酮合酶产物的自动结构预测。
Nat Chem Biol. 2019 Aug;15(8):813-821. doi: 10.1038/s41589-019-0313-7. Epub 2019 Jul 15.
6
Comprehensive catalog of dendritically localized mRNA isoforms from sub-cellular sequencing of single mouse neurons.从单个小鼠神经元的亚细胞测序中鉴定出树突定位的 mRNA 异构体的综合目录。
BMC Biol. 2019 Jan 24;17(1):5. doi: 10.1186/s12915-019-0630-z.
7
cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly. cath-resolve-hits:一个快速解决可疑域名匹配的新工具。
Bioinformatics. 2019 May 15;35(10):1766-1767. doi: 10.1093/bioinformatics/bty863.
8
Gene3D: Extensive prediction of globular domains in proteins.Gene3D:蛋白质球状结构域的广泛预测。
Nucleic Acids Res. 2018 Jan 4;46(D1):D435-D439. doi: 10.1093/nar/gkx1069.
9
Plasmobase: a comparative database of predicted domain architectures for Plasmodium genomes.疟原虫数据库:疟原虫基因组预测结构域架构的比较数据库。
Malar J. 2017 Jun 7;16(1):241. doi: 10.1186/s12936-017-1887-8.
10
Domain prediction with probabilistic directional context.基于概率性方向上下文的域预测
Bioinformatics. 2017 Aug 15;33(16):2471-2478. doi: 10.1093/bioinformatics/btx221.