用于评估基因组区域蛋白质编码潜力的K(A)/K(S)比率测试：一项实证与模拟研究。

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

作者信息

Nekrutenko Anton, Makova Kateryna D, Li Wen-Hsiung

机构信息

Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.

出版信息

Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.

DOI:10.1101/gr.200901

PMID:11779845

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC155263/

Abstract

Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K(S)) occur much more frequently than nonsynonymous ones (K(A)) and uses the K(A)/K(S) ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

摘要

比较基因组学是提高基因预测准确性的一种简单而强大的方法。在本研究中，我们展示了一种利用人类/小鼠序列比对来鉴定蛋白质编码外显子的简单测试方法的实用性。该测试利用了这样一个事实：在绝大多数编码区域中，同义替换（K(S)）的发生频率远高于非同义替换（K(A)），并使用K(A)/K(S)比值作为标准。我们展示了以下几点：（1）大多数人类和小鼠外显子足够长，并且具有适合该测试可靠执行的序列差异程度；（2）该测试适用于鉴定长外显子和单外显子基因，而这些基因目前的方法很难预测；（3）该测试的假阴性率低于大多数当前的基因预测方法，假阳性率低于所有当前方法；（4）该测试已经自动化，可以与其他现有的基因预测方法结合使用。

相似文献

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.

Improving the specificity of exon prediction using comparative genomics.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S13. doi: 10.1186/1471-2164-9-S2-S13.

Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis.

Nucleic Acids Res. 2003 Aug 1;31(15):4639-45. doi: 10.1093/nar/gkg483.

Comparative genomics as a tool for gene discovery.

Curr Opin Biotechnol. 2006 Apr;17(2):161-7. doi: 10.1016/j.copbio.2006.01.007. Epub 2006 Feb 3.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Visualizing the genome: techniques for presenting human genome data and annotations.

BMC Bioinformatics. 2002 Jul 30;3:19. doi: 10.1186/1471-2105-3-19.

GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W280-4. doi: 10.1093/nar/gkl307.

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

Genome Res. 2003 Jun;13(6A):1190-202. doi: 10.1101/gr.703903. Epub 2003 May 12.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

Molecular evolution of cadherin-related neuronal receptor/protocadherin(alpha) (CNR/Pcdh(alpha)) gene cluster in Mus musculus subspecies.

Mol Biol Evol. 2005 Jun;22(6):1433-43. doi: 10.1093/molbev/msi130. Epub 2005 Mar 9.

引用本文的文献

Genome-wide molecular characterization and expression profiling of the cysteine protease gene family in maize.

BMC Genomics. 2025 Sep 1;26(1):789. doi: 10.1186/s12864-025-12003-z.

The Complete Mitochondrial Genome of (Teleostei: Siluriformes: Amblycipitidae): Characterization, Phylogenetic Placement, and Insights into Genetic Diversity.

Genes (Basel). 2025 Aug 19;16(8):977. doi: 10.3390/genes16080977.

Genome-wide identification, characterization and evolutionary analysis of the pyrroline-5-carboxylate synthetase (P5CS), succinic semialdehyde dehydrogenase (SSADH), and dehydrin (DHN) genes in Solanum lycopersicum under drought stress.

BMC Plant Biol. 2025 Aug 9;25(1):1060. doi: 10.1186/s12870-025-07057-w.

Genome identification of NAC gene family and its gene expression patterns in responding to salt and drought stresses in Rhododendron delavayi.

BMC Plant Biol. 2025 Jul 17;25(1):924. doi: 10.1186/s12870-025-06965-1.

Genome-Wide Identification and Salt Stress-Responsive Expression Analysis of the Gene Family in Soybean ( L.).

Plants (Basel). 2025 Jun 30;14(13):2004. doi: 10.3390/plants14132004.

Deciphering ABA/PYL gene family in flax: evolutionary analysis, and abiotic stress response.

Plant Cell Rep. 2025 Jun 6;44(7):140. doi: 10.1007/s00299-025-03517-7.

Reveal genomic insights into cotton domestication and improvement using gene level functional haplotype-based GWAS.

Nat Commun. 2025 May 21;16(1):4734. doi: 10.1038/s41467-025-59983-w.

Genome-Wide Identification and Expression Profiling of Dehydration-Responsive Element-Binding Family Genes in Flax ( L.).

Int J Mol Sci. 2025 Mar 27;26(7):3074. doi: 10.3390/ijms26073074.

Genome-wide identification, characterization, and functional analysis of the CHX, SOS, and RLK genes in Solanum lycopersicum under salt stress.

Sci Rep. 2025 Jan 7;15(1):1142. doi: 10.1038/s41598-024-83221-w.

Genome-wide identification, evolution and expression analysis unveil the role of genes in nitrogen utilization and nitrogen allocation.

Physiol Mol Biol Plants. 2024 Dec;30(12):1983-1999. doi: 10.1007/s12298-024-01541-7. Epub 2024 Dec 17.

本文引用的文献

Evaluation of gene-finding programs on mammalian sequences.

Genome Res. 2001 May;11(5):817-32. doi: 10.1101/gr.147901.

Initial sequencing and analysis of the human genome.

Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062.

The sequence of the human genome.

Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040.

Statistical methods for detecting molecular adaptation.

Trends Ecol Evol. 2000 Dec 1;15(12):496-503. doi: 10.1016/s0169-5347(00)01994-7.

Active conservation of noncoding sequences revealed by three-way species comparisons.

Genome Res. 2000 Sep;10(9):1304-6. doi: 10.1101/gr.142200.

Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.

Genome Res. 2000 Aug;10(8):1115-25. doi: 10.1101/gr.10.8.1115.

Human and mouse gene structure: comparative analysis and application to exon prediction.

Genome Res. 2000 Jul;10(7):950-8. doi: 10.1101/gr.10.7.950.

Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs.

Genome Res. 1999 Sep;9(9):815-24. doi: 10.1101/gr.9.9.815.

Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences.

Proc Natl Acad Sci U S A. 1998 Aug 4;95(16):9407-12. doi: 10.1073/pnas.95.16.9407.

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于评估基因组区域蛋白质编码潜力的K(A)/K(S)比率测试：一项实证与模拟研究。

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

作者信息

Nekrutenko Anton, Makova Kateryna D, Li Wen-Hsiung

机构信息

Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.

出版信息

Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.

DOI:10.1101/gr.200901

PMID:11779845

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC155263/

Abstract

摘要

用于评估基因组区域蛋白质编码潜力的K(A)/K(S)比率测试：一项实证与模拟研究。

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于评估基因组区域蛋白质编码潜力的K(A)/K(S)比率测试：一项实证与模拟研究。

The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

作者信息

机构信息

出版信息