• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

潘迪特:一个带有推断树的蛋白质及相关核苷酸结构域数据库。

Pandit: a database of protein and associated nucleotide domains with inferred trees.

作者信息

Whelan Simon, de Bakker Paul I W, Goldman Nick

机构信息

Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK.

出版信息

Bioinformatics. 2003 Aug 12;19(12):1556-63. doi: 10.1093/bioinformatics/btg188.

DOI:10.1093/bioinformatics/btg188
PMID:12912837
Abstract

MOTIVATION

A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution.

RESULTS

The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.

摘要

动机

一个包含大量高质量同源序列比对且对其相应系统发育树有良好估计的数据库,对于研究系统发育学的人来说将是一个宝贵的资源。它将使研究人员能够在大量不同序列中比较当前和新的序列进化模型。大量的数据可能为研究序列进化的新模型和方法提供灵感,并可能得出关于不同分子过程对进化的相对影响的一般性结论。

结果

Pandit 7.6数据库包含4341个序列家族,这些序列源自Pfam数据库中同源蛋白质结构域家族的氨基酸比对种子比对(Bateman等人,2002年)。Pandit中的每个家族都包括一个与相应Pfam家族种子比对匹配的氨基酸序列比对、一个包含Pfam比对编码序列(如果可以恢复)的DNA序列比对(总体而言,82.9%的序列取自Pfam)以及仅对那些可以恢复DNA序列的序列进行限制的氨基酸序列比对。每个比对都有与之相关的系统发育树估计。树的拓扑结构是使用基于进化距离最大似然估计的邻接法获得的,然后使用标准最大似然方法计算分支长度。

相似文献

1
Pandit: a database of protein and associated nucleotide domains with inferred trees.潘迪特:一个带有推断树的蛋白质及相关核苷酸结构域数据库。
Bioinformatics. 2003 Aug 12;19(12):1556-63. doi: 10.1093/bioinformatics/btg188.
2
PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees.PANDIT:一个以进化为中心的蛋白质及相关核苷酸结构域数据库,并带有推断树。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D327-31. doi: 10.1093/nar/gkj087.
3
Identifying protein domains with the Pfam database.使用Pfam数据库识别蛋白质结构域。
Curr Protoc Bioinformatics. 2003 May;Chapter 2:Unit 2.5. doi: 10.1002/0471250953.bi0205s01.
4
EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.EvoDB:PFAM-A的进化速率概况、相关蛋白质结构域和系统发育树的数据库。
Database (Oxford). 2015 Jul 2;2015:bav065. doi: 10.1093/database/bav065. Print 2015.
5
An improved general amino acid replacement matrix.一种改进的通用氨基酸置换矩阵。
Mol Biol Evol. 2008 Jul;25(7):1307-20. doi: 10.1093/molbev/msn067. Epub 2008 Mar 26.
6
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
7
QAlign: quality-based multiple alignments with dynamic phylogenetic analysis.QAlign:基于质量的多重比对与动态系统发育分析
Bioinformatics. 2003 Aug 12;19(12):1592-3. doi: 10.1093/bioinformatics/btg197.
8
Integrating protein structures and precomputed genealogies in the Magnum database: examples with cellular retinoid binding proteins.整合Magnum数据库中的蛋白质结构和预先计算的谱系:以细胞视黄醇结合蛋白为例。
BMC Bioinformatics. 2006 Feb 23;7:89. doi: 10.1186/1471-2105-7-89.
9
QuickTree: building huge Neighbour-Joining trees of protein sequences.快速树:构建蛋白质序列的大型邻接树
Bioinformatics. 2002 Nov;18(11):1546-7. doi: 10.1093/bioinformatics/18.11.1546.
10
The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs.使用结构信息来提高比对准确性并不能帮助使用轮廓隐马尔可夫模型进行同源物检测。
Bioinformatics. 2002 Sep;18(9):1243-9. doi: 10.1093/bioinformatics/18.9.1243.

引用本文的文献

1
Predicting Phylogenetic Bootstrap Values via Machine Learning.基于机器学习的系统发育自举值预测。
Mol Biol Evol. 2024 Oct 4;41(10). doi: 10.1093/molbev/msae215.
2
Harnessing machine learning to guide phylogenetic-tree search algorithms.利用机器学习指导系统发育树搜索算法。
Nat Commun. 2021 Mar 31;12(1):1983. doi: 10.1038/s41467-021-22073-8.
3
Fructosyltransferase production by Aspergillus oryzae BM-DIA using solid-state fermentation and the properties of its nucleotide and protein sequences.米曲霉 BM-DIA 固态发酵生产果糖转移酶及其核苷酸和蛋白质序列特性。
Folia Microbiol (Praha). 2021 Jun;66(3):469-481. doi: 10.1007/s12223-021-00862-4. Epub 2021 Mar 26.
4
Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space.模糊编码允许从聚合状态空间中的比对中准确推断进化参数。
Syst Biol. 2021 Jan 1;70(1):21-32. doi: 10.1093/sysbio/syaa036.
5
Model selection may not be a mandatory step for phylogeny reconstruction.模型选择可能不是系统发育重建的强制性步骤。
Nat Commun. 2019 Feb 25;10(1):934. doi: 10.1038/s41467-019-08822-w.
6
Alignment Modulates Ancestral Sequence Reconstruction Accuracy.比对方式调节祖先序列重建准确性。
Mol Biol Evol. 2018 Jul 1;35(7):1783-1797. doi: 10.1093/molbev/msy055.
7
Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.在存在多序列比对不确定性的情况下系统发育方法统计不一致性的证据。
Genome Biol Evol. 2015 Jul 1;7(8):2102-16. doi: 10.1093/gbe/evv127.
8
A generalized mechanistic codon model.一个广义的机制性密码子模型。
Mol Biol Evol. 2014 Sep;31(9):2528-41. doi: 10.1093/molbev/msu196. Epub 2014 Jun 23.
9
Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino-acid models.解决深层节肢动物系统发育基因组学中核苷酸和氨基酸之间的差异:区分 21 种氨基酸模型中的丝氨酸密码子。
PLoS One. 2012;7(11):e47450. doi: 10.1371/journal.pone.0047450. Epub 2012 Nov 20.
10
Phylogenetic properties of RNA viruses.RNA 病毒的系统发育特性。
PLoS One. 2012;7(9):e44849. doi: 10.1371/journal.pone.0044849. Epub 2012 Sep 20.