Vasmatzis George, Klee Eric W, Kube Dagmar M, Therneau Terry M, Kosari Farhad
Mayo Clinic Comprehensive Cancer Center and Division of Experimental Pathology, Department of Laboratory Medicine and Pathology, Rochester, MN 55905, USA.
Bioinformatics. 2007 Jun 1;23(11):1348-55. doi: 10.1093/bioinformatics/btm102. Epub 2007 Mar 23.
We describe a method to identify candidate cancer biomarkers by analyzing numeric approximations of tissue specificity of human genes. These approximations were calculated by analyzing predicted tissue expression distributions of genes derived from mapping expressed sequence tags (ESTs) to the human genome sequence using a binary indexing algorithm. Tissue-specificity values facilitated high-throughput analysis of the human genes and enabled the identification of genes highly specific to different tissues. Tissue expression distributions for several genes were compared to estimates obtained from other public gene expression datasets and experimentally validated using quantitative RT-PCR on RNA isolated from several human tissues. Our results demonstrate that most human genes ( approximately 98%) are expressed in many tissues (low specificity), and only a small number of genes possess very specific tissue expression profiles. These genes comprise a rich dataset from which novel therapeutic targets and novel diagnostic serum biomarkers may be selected.
我们描述了一种通过分析人类基因组织特异性的数值近似值来识别候选癌症生物标志物的方法。这些近似值是通过使用二元索引算法分析从表达序列标签(EST)映射到人类基因组序列所衍生的基因的预测组织表达分布来计算的。组织特异性值有助于对人类基因进行高通量分析,并能够识别对不同组织高度特异的基因。将几个基因的组织表达分布与从其他公共基因表达数据集获得的估计值进行比较,并使用定量RT-PCR对从几个人类组织分离的RNA进行实验验证。我们的结果表明,大多数人类基因(约98%)在许多组织中表达(低特异性),只有少数基因具有非常特异的组织表达谱。这些基因构成了一个丰富的数据集,从中可以选择新的治疗靶点和新的诊断血清生物标志物。