• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在无名脊椎动物序列中识别基因的困难。

The difficulty of identifying genes in anonymous vertebrate sequences.

作者信息

Claverie J M, Poirot O, Lopez F

机构信息

Structural and Genetic Information Laboratory, C.N.R.S.-E.P. 91, Institute of Structural Biology and Microbiology, Marseille, France.

出版信息

Comput Chem. 1997;21(4):203-14. doi: 10.1016/s0097-8485(96)00039-3.

DOI:10.1016/s0097-8485(96)00039-3
PMID:9415985
Abstract

The identification of genes in newly determined vertebrate genomic sequences can range from a trivial to an impossible task. In a statistical preamble, we show how "insignificant" are the individual features on which gene identification can be rigorously based: promoter signals, splice sites, open reading frames, etc. The practical identification of genes is thus ultimately a tributary of their resemblance to those already present in sequence databases, or incorporated into training sets. The inherent conservatism of the currently popular methods (database similarity search, GRAIL) will greatly limit our capacity for making unexpected biological discoveries from increasingly abundant genomic data. Beyond a very limited subset of trivial cases, the automated interpretation (i.e. without experimental validation) of genomic data, is still a myth. On the other hand, characterizing the 60,000 to 100,000 genes thought to be hidden in the human genome by the mean of individual experiments is not feasible. Thus, it appears that our only hope of turning genome data into genome information must rely on drastic progresses in the way we identify and analyse genes in silico.

摘要

在新确定的脊椎动物基因组序列中鉴定基因,其难度可能从轻而易举到几乎不可能。在一个统计学引言中,我们展示了那些可严格用于基因鉴定的个体特征(如启动子信号、剪接位点、开放阅读框等)是多么“微不足道”。因此,基因的实际鉴定最终实际上取决于它们与序列数据库中已有的基因或纳入训练集的基因的相似程度。当前流行方法(数据库相似性搜索、GRAIL)固有的保守性将极大地限制我们从日益丰富的基因组数据中做出意外生物学发现的能力。除了极少数非常简单的情况外,基因组数据的自动解读(即无需实验验证)仍然是个神话。另一方面,通过单个实验来表征被认为隐藏在人类基因组中的6万到10万个基因是不可行的。因此,看来我们将基因组数据转化为基因组信息的唯一希望必须依赖于我们在计算机上鉴定和分析基因的方式取得重大进展。

相似文献

1
The difficulty of identifying genes in anonymous vertebrate sequences.在无名脊椎动物序列中识别基因的困难。
Comput Chem. 1997;21(4):203-14. doi: 10.1016/s0097-8485(96)00039-3.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Computational methods for the identification of genes in vertebrate genomic sequences.用于鉴定脊椎动物基因组序列中基因的计算方法。
Hum Mol Genet. 1997;6(10):1735-44. doi: 10.1093/hmg/6.10.1735.
4
The human SPANX multigene family: genomic organization, alignment and expression in male germ cells and tumor cell lines.人类SPANX多基因家族:基因组组织、比对以及在雄性生殖细胞和肿瘤细胞系中的表达
Gene. 2003 May 8;309(2):125-33. doi: 10.1016/s0378-1119(03)00497-9.
5
Computational methods for exon detection.外显子检测的计算方法。
Mol Biotechnol. 1998 Aug;10(1):27-48. doi: 10.1007/BF02745861.
6
Analysis of the human GDNF gene reveals an inducible promoter, three exons, a triplet repeat within the 3'-UTR and alternative splice products.对人类胶质细胞源性神经营养因子(GDNF)基因的分析揭示了一个可诱导的启动子、三个外显子、3'-非翻译区内的一个三联体重复序列以及可变剪接产物。
Hum Mol Genet. 1998 Nov;7(12):1873-86. doi: 10.1093/hmg/7.12.1873.
7
Transcript mapping of the human chromosome 11q12-q13.1 gene-rich region identifies several newly described conserved genes.人类染色体11q12 - q13.1富含基因区域的转录本图谱鉴定出几个新描述的保守基因。
Genomics. 1998 May 1;49(3):419-29. doi: 10.1006/geno.1998.5291.
8
Comparative promoter region analysis powered by CORG.由CORG驱动的启动子区域比较分析。
BMC Genomics. 2005 Feb 21;6:24. doi: 10.1186/1471-2164-6-24.
9
Human transcription factor Sp3: genomic structure, identification of a processed pseudogene, and transcript analysis.人类转录因子Sp3:基因组结构、一个加工假基因的鉴定及转录本分析。
Gene. 2004 Oct 27;341:235-47. doi: 10.1016/j.gene.2004.06.055.
10
Exon detection by similarity searches.
Methods Mol Biol. 1997;68:283-313. doi: 10.1385/0-89603-482-8:283.

引用本文的文献

1
Statistical analysis of synonymous and stop codons in pseudo-random and real sequences as a function of GC content.统计分析伪随机序列和真实序列中同义密码子和终止密码子与 GC 含量的关系。
Sci Rep. 2023 Dec 27;13(1):22996. doi: 10.1038/s41598-023-49626-9.
2
Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship.认识人类基因的多顺反子性质对于理解基因型-表型关系至关重要。
Genome Res. 2018 May;28(5):609-624. doi: 10.1101/gr.230938.117. Epub 2018 Apr 6.
3
Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.
评估高通量从头基因预测软件,以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。
PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30.
4
DNA-energetics-based analyses suggest additional genes in prokaryotes.基于 DNA 能量的分析表明原核生物中有其他基因。
J Biosci. 2012 Jul;37(3):433-44. doi: 10.1007/s12038-012-9221-7.
5
Hundreds of putatively functional small open reading frames in Drosophila.果蝇中数百个假定具有功能的小开放阅读框。
Genome Biol. 2011 Nov 25;12(11):R118. doi: 10.1186/gb-2011-12-11-r118.
6
Generic eukaryotic core promoter prediction using structural features of DNA.利用DNA结构特征进行通用真核生物核心启动子预测。
Genome Res. 2008 Feb;18(2):310-23. doi: 10.1101/gr.6991408. Epub 2007 Dec 20.
7
Current methods of gene prediction, their strengths and weaknesses.当前的基因预测方法、其优势与不足。
Nucleic Acids Res. 2002 Oct 1;30(19):4103-17. doi: 10.1093/nar/gkf543.
8
Reverse transcriptase-polymerase chain reaction validation of 25 "orphan" genes from Escherichia coli K-12 MG1655.来自大肠杆菌K-12 MG1655的25个“孤儿”基因的逆转录酶聚合酶链反应验证
Genome Res. 2000 Jul;10(7):959-66. doi: 10.1101/gr.10.7.959.
9
Self-identification of protein-coding regions in microbial genomes.微生物基因组中蛋白质编码区域的自我识别。
Proc Natl Acad Sci U S A. 1998 Aug 18;95(17):10026-31. doi: 10.1073/pnas.95.17.10026.