• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质编码基因、非编码基因以及基因间人类DNA中的独特序列特征。

Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA.

作者信息

Guigó R, Fickett J W

机构信息

Theoretical Biology and Biophysics Group Los Alamos National Laboratory, NM 87545, USA.

出版信息

J Mol Biol. 1995 Oct 13;253(1):51-60. doi: 10.1006/jmbi.1995.0535.

DOI:10.1006/jmbi.1995.0535
PMID:7473716
Abstract

We have studied the behavior of a number of sequence statistics, mostly indicative of protein coding function, in a large set of human clone sequences randomly selected in the course of genome mapping (randomly selected clone sequences), and compared this with the behavior in known sequences containing genes (which we term genic sequences). As expected, given the higher coding density of the genic sequences, the sequence statistics studied behave in a substantially different manner in the randomly selected clone sequences (mostly intergenic DNA) and in the genic sequences. Strong differences in behavior of a number of such statistics are also observed, however when the randomly selected clone sequences are compared with only the non-coding fraction of the genic sequences, suggesting that intergenic and genic non-coding DNA constitute two different classes of non-coding DNA. By studying the behavior of the sequence statistics in simulated DNA of different C+G content, we have observed that a number of them are strongly dependent on C+G content. Thus, most differences between intergenic and genic non-coding DNA can be explained by differences in C+G content. A+T-rich intergenic DNA appears to be at the compositional equilibrium expected under random mutation, while C+G richer non-coding genic DNA is far from this equilibrium. The results obtained in simulated DNA indicate, on the other hand, that a very large fraction of the variation in the coding statistics that underlie gene identification algorithms is due simply to C+G content, and is not directly related to protein coding function. It appears, thus, that the performance of gene-finding algorithms should be improved by carefully distinguishing the effects of protein coding function from those of mere base compositional variation on such coding statistics.

摘要

我们研究了许多主要指示蛋白质编码功能的序列统计量在一组在基因组作图过程中随机选择的人类克隆序列(随机选择的克隆序列)中的行为,并将其与包含基因的已知序列(我们称为基因序列)中的行为进行了比较。正如预期的那样,鉴于基因序列的编码密度更高,所研究的序列统计量在随机选择的克隆序列(主要是基因间DNA)和基因序列中的行为方式有很大不同。然而,当将随机选择的克隆序列仅与基因序列的非编码部分进行比较时,也观察到许多此类统计量在行为上有强烈差异,这表明基因间和基因非编码DNA构成了两类不同的非编码DNA。通过研究不同C+G含量的模拟DNA中序列统计量的行为,我们观察到其中许多统计量强烈依赖于C+G含量。因此,基因间和基因非编码DNA之间的大多数差异可以用C+G含量的差异来解释。富含A+T的基因间DNA似乎处于随机突变预期的组成平衡状态,而富含C+G的非编码基因DNA则远非这种平衡状态。另一方面,在模拟DNA中获得的结果表明,基因识别算法所依据的编码统计量中很大一部分变异仅仅是由于C+G含量,而与蛋白质编码功能没有直接关系。因此,似乎通过仔细区分蛋白质编码功能的影响与这种编码统计量中单纯碱基组成变异的影响,基因发现算法性能应该可以得到提高。

相似文献

1
Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA.蛋白质编码基因、非编码基因以及基因间人类DNA中的独特序列特征。
J Mol Biol. 1995 Oct 13;253(1):51-60. doi: 10.1006/jmbi.1995.0535.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: evolutionary footprints of RNA silencing.在拟南芥基因组基因间区域重复的编码DNA:RNA沉默的进化印记
Mol Biosyst. 2009 Dec;5(12):1679-87. doi: 10.1039/b903031j. Epub 2009 Apr 30.
4
Negative correlation of G+C content at silent substitution sites between orthologous human and mouse protein-coding sequences.直系同源的人类和小鼠蛋白质编码序列之间沉默替换位点的G+C含量呈负相关。
DNA Res. 2006 Aug 31;13(4):135-40. doi: 10.1093/dnares/dsl007. Epub 2006 Oct 17.
5
Revisiting the relationship between compositional sequence complexity and periodicity.重新审视组成序列复杂性与周期性之间的关系。
Comput Biol Chem. 2008 Feb;32(1):17-28. doi: 10.1016/j.compbiolchem.2007.09.001. Epub 2007 Sep 12.
6
Coding and non-coding DNA thermal stability differences in eukaryotes studied by melting simulation, base shuffling and DNA nearest neighbor frequency analysis.通过解链模拟、碱基重排和DNA近邻频率分析研究真核生物中编码DNA和非编码DNA的热稳定性差异。
Biophys Chem. 2004 Jul 1;110(1-2):25-38. doi: 10.1016/j.bpc.2004.01.001.
7
Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing.玉米叶绿体基因组的完整序列:基因组成、差异热点以及通过转录编辑对遗传信息的微调
J Mol Biol. 1995 Sep 1;251(5):614-28. doi: 10.1006/jmbi.1995.0460.
8
Identification of coding and non-coding sequences using local Holder exponent formalism.使用局部赫尔德指数形式主义识别编码和非编码序列。
Bioinformatics. 2005 Oct 15;21(20):3818-23. doi: 10.1093/bioinformatics/bti639. Epub 2005 Aug 23.
9
[Correction of five different types of errors of model REFSEQs appeared in NCBI human gene database only by using two novel human genes C17orf32 and ZNF362].[仅通过使用两个新的人类基因C17orf32和ZNF362校正出现在NCBI人类基因数据库中的五种不同类型的模型REFSEQs错误]
Yi Chuan Xue Bao. 2004 Apr;31(4):325-34.
10
IdentiCS--identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence.IdentiCS——直接从未注释的低覆盖度细菌基因组序列中鉴定编码序列并进行代谢网络的计算机重建。
BMC Bioinformatics. 2004 Aug 16;5:112. doi: 10.1186/1471-2105-5-112.

引用本文的文献

1
Coding sequence density estimation via topological pressure.通过拓扑压力进行编码序列密度估计。
J Math Biol. 2015 Jan;70(1-2):45-69. doi: 10.1007/s00285-014-0754-2. Epub 2014 Jan 22.
2
A novel role of the Sp/KLF transcription factor KLF11 in arresting progression of endometriosis.Sp/KLF 转录因子 KLF11 在阻止子宫内膜异位症进展中的新作用。
PLoS One. 2013;8(3):e60165. doi: 10.1371/journal.pone.0060165. Epub 2013 Mar 28.
3
Predicting statistical properties of open reading frames in bacterial genomes.预测细菌基因组中开放阅读框的统计特性。
PLoS One. 2012;7(9):e45103. doi: 10.1371/journal.pone.0045103. Epub 2012 Sep 24.
4
Evaluation of gene-finding programs on mammalian sequences.哺乳动物序列基因发现程序评估
Genome Res. 2001 May;11(5):817-32. doi: 10.1101/gr.147901.
5
An assessment of gene prediction accuracy in large DNA sequences.大型DNA序列中基因预测准确性的评估。
Genome Res. 2000 Oct;10(10):1631-42. doi: 10.1101/gr.122800.
6
Genomic organization of the S locus: Identification and characterization of genes in SLG/SRK region of S(9) haplotype of Brassica campestris (syn. rapa).S位点的基因组组织:白菜(syn. rapa)S(9)单倍型的SLG/SRK区域中基因的鉴定与表征
Genetics. 1999 Sep;153(1):391-400. doi: 10.1093/genetics/153.1.391.
7
A relationship between GC content and coding-sequence length.GC含量与编码序列长度之间的关系。
J Mol Evol. 1996 Sep;43(3):216-23. doi: 10.1007/BF02338829.