Sagynaliev Emil, Steinert Ralf, Nestler Gerd, Lippert Hans, Knoch Manfred, Reymond Marc-André
Department of Surgery, Johanniter Krankenhaus, Stendal, Germany.
Proteomics. 2005 Aug;5(12):3066-78. doi: 10.1002/pmic.200402107.
Based on biomedical literature databases, we tried a first step for constructing a gene expression "data warehouse" specific to human colorectal cancer (CRC). Results of genome-wide transcriptomic research were available from 12 studies, using various technologies, namely, SAGE, cDNA and oligonucleotide arrays, and adaptor-tagged amplification. Three studies analyzed CRC cell lines and nine studies of human samples. The total number of patients was 144. Out of 982 up- or down-regulated genes, 863 (88%) were found to be differentially expressed in a single study, 88 in two studies, 22 in three studies, 7 in four studies, and only 2 genes in six studies. Eight large-scale proteomics studies were published in CRC, using 2-D-, SDS- or free-flow electrophoresis, involving only 11 patients. Out of 408 differentially expressed proteins, 339 (83%) were found to be differentially expressed only in a single study, 16 in three studies, 10 in four studies, 3 in five, and 1 in eight studies. Confirmation at proteome level of results obtained with large-scale transcriptomics studies was possible in 25%. This proportion was higher (67%) for reproducing proteome results using transcriptomics technologies. Obviously, reproducibility and overlapping between published gene expression results at proteome and transcriptome level are low in human CRC. Thus, the development of standardized processes for collecting samples, storing, retrieving, and querying gene expression data obtained with different technologies is of central importance in translational research.
基于生物医学文献数据库,我们尝试迈出构建特定于人类结直肠癌(CRC)的基因表达“数据仓库”的第一步。来自12项研究的全基因组转录组学研究结果可用,这些研究使用了各种技术,即SAGE、cDNA和寡核苷酸阵列以及接头标签扩增。三项研究分析了CRC细胞系,九项研究分析了人类样本。患者总数为144人。在982个上调或下调基因中,863个(88%)在一项研究中被发现差异表达,88个在两项研究中,22个在三项研究中,7个在四项研究中,只有2个基因在六项研究中。在CRC领域发表了八项大规模蛋白质组学研究,使用二维、SDS或自由流动电泳,仅涉及11名患者。在408个差异表达蛋白质中,339个(83%)仅在一项研究中被发现差异表达,16个在三项研究中,10个在四项研究中,3个在五项研究中,1个在八项研究中。大规模转录组学研究获得的结果在蛋白质组水平上的确认率为25%。使用转录组学技术重现蛋白质组结果时,这一比例更高(67%)。显然,在人类CRC中,已发表的蛋白质组和转录组水平的基因表达结果之间的可重复性和重叠性较低。因此,开发用于收集样本、存储、检索和查询用不同技术获得的基因表达数据的标准化流程在转化研究中至关重要。