系统发育分析中蛋白质序列与核酸序列的结合：同源异型框蛋白实例

On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

作者信息

Agosti D, Jacobs D, DeSalle R

机构信息

Department of Entomology, American Museum of Natural History, New York 10024, USA.

出版信息

Cladistics. 1996;12:65-82. doi: 10.1111/j.1096-0031.1996.tb00193.x.

DOI:10.1111/j.1096-0031.1996.tb00193.x

PMID:11541749

Abstract

Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impact information to a hypothesis. These two possibilities are cases of non-independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes from Drosophila melanogaster to develop a hypothesis of genealogical relationship of these genes in this large multigene family.

摘要

氨基酸编码基因包含的特征状态信息，至少在两个层面上可能对系统发育分析有用。核苷酸序列和翻译后的氨基酸序列都已分别用作各种分类群分支系统学研究的特征状态，包括多基因家族中基因谱系的研究。本质上，氨基酸序列和核酸序列是对基因中的信息进行特征编码的两种不同方式。核苷酸序列中的沉默位点（密码子中的第一位或第三位，其变化不会改变三联体编码的氨基酸的身份）可能变化相对较快并趋于饱和，从而失去历史分歧模式。另一方面，非沉默核苷酸改变及其伴随的氨基酸变化可能进化得太慢，无法揭示密切相关分类群之间的关系。一般来说，蛋白质编码基因中沉默和非沉默位点的序列变化动态分别导致了平行进化和缺乏分辨率。我们认为，将核酸和翻译后的氨基酸编码特征状态组合到同一个数据矩阵中进行系统发育分析，可以解决由沉默核苷酸位点的快速变化以及非沉默核苷酸位点和缓慢变化的氨基酸位点总体变化速率缓慢所引起的一些问题。这种方法的一个主要理论问题是这两种特征来源明显不独立。然而，在系统发育背景下逐个密码子地比较蛋白质编码核酸序列及其翻译后的氨基酸时，至少有三种可能的结果。首先，一个密码子的两个特征集在它们所传达的关于某一组分类群关系的信息方面可能完全一致。其次，一个特征集可能不显示关于系统发育假设的任何信息，而另一个特征集可能会对一个假设产生影响。然而，这两种可能性是非独立的情况，我们认为在这种情况下的一致性可以被视为增加了由这些特征支持的特定系统发育假设的权重。在第三种情况下，特定密码子的两个特征信息来源可能在关于所研究分类群的系统发育假设方面完全不一致。在最后这种情况下，两个特征集是独立的，因为来自任何一个的信息都无法预测另一个的特征状态。讨论了这些可能性的例子，并使用来自黑腹果蝇46个同源异型框基因同源异型框区域的序列，提出了这两种信息来源组合对于蛋白质编码基因的一般适用性，以建立这个大多基因家族中这些基因谱系关系的假设。

相似文献

On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

Cladistics. 1996;12:65-82. doi: 10.1111/j.1096-0031.1996.tb00193.x.

Sequence evolution in mitochondrial ribosomal and ND-1 genes in lepidoptera: implications for phylogenetic analyses.

Mol Biol Evol. 1992 Nov;9(6):1061-75. doi: 10.1093/oxfordjournals.molbev.a040778.

Mutation and selection at silent and replacement sites in the evolution of animal mitochondrial DNA.

Genetica. 1998;102-103(1-6):393-407.

The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization.

Genetics. 1993 Jan;133(1):97-117. doi: 10.1093/genetics/133.1.97.

Mitochondrial phylogeny of Anura (Amphibia): a case study of congruent phylogenetic reconstruction using amino acid and nucleotide characters.

Gene. 2006 Feb 1;366(2):228-37. doi: 10.1016/j.gene.2005.07.034. Epub 2005 Nov 22.

Homoplasy in genome-wide analysis of rare amino acid replacements: the molecular-evolutionary basis for Vavilov's law of homologous series.

Biol Direct. 2008 Mar 17;3:7. doi: 10.1186/1745-6150-3-7.

Substitution bias, rapid saturation, and the use of mtDNA for nematode systematics.

Mol Biol Evol. 1998 Dec;15(12):1719-27. doi: 10.1093/oxfordjournals.molbev.a025898.

Granule-bound starch synthase: structure, function, and phylogenetic utility.

Mol Biol Evol. 1998 Dec;15(12):1658-73. doi: 10.1093/oxfordjournals.molbev.a025893.

The Strepsiptera problem: phylogeny of the holometabolous insect orders inferred from 18S and 28S ribosomal DNA sequences and morphology.

Syst Biol. 1997 Mar;46(1):1-68. doi: 10.1093/sysbio/46.1.1.

Mitochondrial DNA sequences and multiple data sets: a phylogenetic study of phytophagous beetles (Chrysomelidae: Ophraella).

Mol Biol Evol. 1995 Jul;12(4):627-40. doi: 10.1093/oxfordjournals.molbev.a040242.

引用本文的文献

Comparative transcriptomic and evolutionary analysis of FAD-like genes of Brassica species revealed their role in fatty acid biosynthesis and stress tolerance.

BMC Plant Biol. 2023 May 12;23(1):250. doi: 10.1186/s12870-023-04232-9.

Transformation Series as an Ideographic Character Concept.

Cladistics. 2004 Feb;20(1):23-31. doi: 10.1111/j.1096-0031.2004.00003.x.

Phosphotyrosine phosphatase R3 receptors: Origin, evolution and structural diversification.

PLoS One. 2017 Mar 3;12(3):e0172887. doi: 10.1371/journal.pone.0172887. eCollection 2017.

Generation of divergent uroplakin tetraspanins and their partners during vertebrate evolution: identification of novel uroplakins.

BMC Evol Biol. 2014 Jan 23;14:13. doi: 10.1186/1471-2148-14-13.

Ancient origins of vertebrate-specific innate antiviral immunity.

Mol Biol Evol. 2014 Jan;31(1):140-53. doi: 10.1093/molbev/mst184. Epub 2013 Oct 8.

The evolution of the major hepatitis C genotypes correlates with clinical response to interferon therapy.

PLoS One. 2009 Aug 11;4(8):e6579. doi: 10.1371/journal.pone.0006579.

Evidence, content and corroboration and the Tree of Life.

Acta Biotheor. 2009 Jun;57(1-2):187-99. doi: 10.1007/s10441-008-9066-5. Epub 2008 Nov 18.

Phylogenetic incongruence among oncogenic genital alpha human papillomaviruses.

J Virol. 2005 Dec;79(24):15503-10. doi: 10.1128/JVI.79.24.15503-15510.2005.

sine oculis in basal Metazoa.

Dev Genes Evol. 2004 Jul;214(7):342-51. doi: 10.1007/s00427-004-0407-3. Epub 2004 Jun 25.

Did homeodomain proteins duplicate before the origin of angiosperms, fungi, and metazoa?

Proc Natl Acad Sci U S A. 1997 Dec 9;94(25):13749-53. doi: 10.1073/pnas.94.25.13749.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

系统发育分析中蛋白质序列与核酸序列的结合：同源异型框蛋白实例

On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

作者信息

Agosti D, Jacobs D, DeSalle R

机构信息

Department of Entomology, American Museum of Natural History, New York 10024, USA.

出版信息

Cladistics. 1996;12:65-82. doi: 10.1111/j.1096-0031.1996.tb00193.x.

DOI:10.1111/j.1096-0031.1996.tb00193.x

PMID:11541749

Abstract

摘要

系统发育分析中蛋白质序列与核酸序列的结合：同源异型框蛋白实例

On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

系统发育分析中蛋白质序列与核酸序列的结合：同源异型框蛋白实例

On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

作者信息

机构信息

出版信息

相似文献

引用本文的文献