Suppr超能文献

通过EST数据挖掘发现1000多种假定的新型人类信号蛋白。

More than 1,000 putative new human signalling proteins revealed by EST data mining.

作者信息

Schultz J, Doerks T, Ponting C P, Copley R R, Bork P

机构信息

[1] EMBL, Heidelberg, Germany. [2] Max-Delbrück-Center, Berlin-Buch, Germany.

出版信息

Nat Genet. 2000 Jun;25(2):201-4. doi: 10.1038/76069.

Abstract

Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.

摘要

借助EST数据库同源性搜索的克隆程序加快了新基因的发现速度,但搜索EST数据库仍然是一项复杂且艰巨的任务。超过160万条人类EST序列已存入公共数据库,这使得识别代表新基因的EST变得困难。与高测序错误率和远缘同源物之间低序列相似性相关的检测困难进一步加剧了规模问题。我们开发了一种新方法,将基于BLAST的搜索与结构域识别协议相结合,以筛选候选同源物。将该方法应用于对100个信号结构域家族的大规模分析,已鉴定出代表1000多个新型人类信号基因的EST。代表这些基因的4206条公开可用的EST是快速克隆新型人类信号蛋白的宝贵资源。例如,我们能够鉴定出至少106种新的小GTP酶的EST,其中6种可能属于新的亚家族。在某些情况下,对基因组DNA的进一步分析导致发现了以前未鉴定的全长蛋白质序列。这以一种具有两个催化结构域的新型GTP酶的电子克隆(仅使用基因组和EST序列数据预测基因产物序列)为例。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验