Mushegian A R, Garey J R, Martin J, Liu L X
AxyS Pharmaceuticals, Inc., La Jolla, California 92037, USA.
Genome Res. 1998 Jun;8(6):590-8. doi: 10.1101/gr.8.6.590.
Comparisons of DNA and protein sequences between humans and model organisms, including the yeast Saccharomyces cerevisiae, the nematode Caenorhabditis elegans, and the fruit fly Drosophila melanogaster, are a significant source of information about the function of human genes and proteins in both normal and disease states. Important questions regarding cross-species sequence comparison remain unanswered, including (1) the fraction of the metabolic, signaling, and regulatory pathways that is shared by humans and the various model organisms; and (2) the validity of functional inferences based on sequence homology. We addressed these questions by analyzing the available fractions of human, fly, nematode, and yeast genomes for orthologous protein-coding genes, applying strict criteria to distinguish between candidate orthologous and paralogous proteins. Forty-two quartets of proteins could be identified as candidate orthologs. Twenty-four Drosophila protein sequences were more similar to their human orthologs than the corresponding nematode proteins. Analysis of sequence substitutions and evolutionary distances in this data set revealed that most C. elegans genes are evolving more rapidly than Drosophila genes, suggesting that unequal evolutionary rates may contribute to the differences in similarity to human protein sequences. The available fraction of Drosophila proteins appears to lack representatives of many protein families and domains, reflecting the relative paucity of genomic data from this species.
对人类与模式生物(包括酿酒酵母、秀丽隐杆线虫和黑腹果蝇)之间的DNA和蛋白质序列进行比较,是了解人类基因和蛋白质在正常及疾病状态下功能的重要信息来源。关于跨物种序列比较的一些重要问题仍未得到解答,包括:(1)人类与各种模式生物共享的代谢、信号传导和调节途径的比例;以及(2)基于序列同源性进行功能推断的有效性。我们通过分析人类、果蝇、线虫和酵母基因组中直系同源蛋白质编码基因的可用部分来解决这些问题,应用严格标准区分候选直系同源蛋白和旁系同源蛋白。可以确定42组蛋白质为候选直系同源物。24个果蝇蛋白质序列与其人类直系同源物的相似性高于相应的线虫蛋白质。对该数据集中序列替换和进化距离的分析表明,大多数秀丽隐杆线虫基因的进化速度比果蝇基因快,这表明进化速度不均可能导致与人类蛋白质序列相似性的差异。果蝇蛋白质的可用部分似乎缺乏许多蛋白质家族和结构域的代表,这反映了该物种基因组数据相对较少。