Clark Hilary F, Gurney Austin L, Abaya Evangeline, Baker Kevin, Baldwin Daryl, Brush Jennifer, Chen Jian, Chow Bernard, Chui Clarissa, Crowley Craig, Currell Bridget, Deuel Bethanne, Dowd Patrick, Eaton Dan, Foster Jessica, Grimaldi Christopher, Gu Qimin, Hass Philip E, Heldens Sherry, Huang Arthur, Kim Hok Seon, Klimowski Laura, Jin Yisheng, Johnson Stephanie, Lee James, Lewis Lhney, Liao Dongzhou, Mark Melanie, Robbie Edward, Sanchez Celina, Schoenfeld Jill, Seshagiri Somasekar, Simmons Laura, Singh Jennifer, Smith Victoria, Stinson Jeremy, Vagts Alicia, Vandlen Richard, Watanabe Colin, Wieand David, Woods Kathryn, Xie Ming-Hong, Yansura Daniel, Yi Sothy, Yu Guoying, Yuan Jean, Zhang Min, Zhang Zemin, Goddard Audrey, Wood William I, Godowski Paul, Gray Alane
Departments of Bioinformatics, Molecular Biology and Protein Chemistry, Genentech, Inc, South San Francisco, California 94080, USA.
Genome Res. 2003 Oct;13(10):2265-70. doi: 10.1101/gr.1293003. Epub 2003 Sep 15.
A large-scale effort, termed the Secreted Protein Discovery Initiative (SPDI), was undertaken to identify novel secreted and transmembrane proteins. In the first of several approaches, a biological signal sequence trap in yeast cells was utilized to identify cDNA clones encoding putative secreted proteins. A second strategy utilized various algorithms that recognize features such as the hydrophobic properties of signal sequences to identify putative proteins encoded by expressed sequence tags (ESTs) from human cDNA libraries. A third approach surveyed ESTs for protein sequence similarity to a set of known receptors and their ligands with the BLAST algorithm. Finally, both signal-sequence prediction algorithms and BLAST were used to identify single exons of potential genes from within human genomic sequence. The isolation of full-length cDNA clones for each of these candidate genes resulted in the identification of >1000 novel proteins. A total of 256 of these cDNAs are still novel, including variants and novel genes, per the most recent GenBank release version. The success of this large-scale effort was assessed by a bioinformatics analysis of the proteins through predictions of protein domains, subcellular localizations, and possible functional roles. The SPDI collection should facilitate efforts to better understand intercellular communication, may lead to new understandings of human diseases, and provides potential opportunities for the development of therapeutics.
一项名为分泌蛋白发现计划(SPDI)的大规模工作旨在识别新的分泌蛋白和跨膜蛋白。在几种方法中的第一种方法中,利用酵母细胞中的生物信号序列陷阱来识别编码假定分泌蛋白的cDNA克隆。第二种策略使用各种算法来识别信号序列的疏水特性等特征,以识别来自人类cDNA文库的表达序列标签(EST)编码的假定蛋白。第三种方法使用BLAST算法在EST中搜索与一组已知受体及其配体的蛋白质序列相似性。最后,信号序列预测算法和BLAST都被用于从人类基因组序列中识别潜在基因的单个外显子。分离这些候选基因中的每一个的全长cDNA克隆导致识别出1000多种新蛋白。根据最新的GenBank发布版本,这些cDNA中共有256个仍然是新的,包括变体和新基因。通过对蛋白质进行蛋白质结构域、亚细胞定位和可能的功能作用预测的生物信息学分析,评估了这项大规模工作的成功。SPDI集合应有助于更好地理解细胞间通讯的努力,可能会带来对人类疾病的新认识,并为治疗方法的开发提供潜在机会。