College of Bioinformatics Science and Technology and Bio-pharmaceutical Key Laboratory of Heilongjiang Province, Harbin Medical University, Harbin, People's Republic of China.
J R Soc Interface. 2012 May 7;9(70):1063-72. doi: 10.1098/rsif.2011.0551. Epub 2011 Oct 13.
Numerous gene sets have been used as molecular signatures for exploring the genetic basis of complex disorders. These gene sets are distinct but related to each other in many cases; therefore, efforts have been made to compare gene sets for studies such as those evaluating the reproducibility of different experiments. Comparison in terms of biological function has been demonstrated to be helpful to biologists. We improved the measurement of semantic similarity to quantify the functional association between gene sets in the context of gene ontology and developed a web toolkit named Gene Set Functional Similarity (GSFS; http://bioinfo.hrbmu.edu.cn/GSFS). Validation based on protein complexes for which the functional associations are known demonstrated that the GSFS scores tend to be correlated with sequence similarity scores and that complexes with high GSFS scores tend to be involved in the same functional catalogue. Compared with the pairwise method and the annotation method, the GSFS shows better discrimination and more accurately reflects the known functional catalogues shared between complexes. Case studies comparing differentially expressed genes of prostate tumour samples from different microarray platforms and identifying coronary heart disease susceptibility pathways revealed that the method could contribute to future studies exploring the molecular basis of complex disorders.
许多基因集被用作探索复杂疾病遗传基础的分子特征。这些基因集是不同的,但在许多情况下彼此相关;因此,人们一直在努力比较基因集,例如评估不同实验的可重复性的研究。在生物学功能方面的比较已被证明对生物学家有帮助。我们改进了语义相似性的测量方法,以量化基因本体论上下文中基因集之间的功能关联,并开发了一个名为基因集功能相似性(GSFS;http://bioinfo.hrbmu.edu.cn/GSFS)的网络工具包。基于功能关联已知的蛋白质复合物的验证表明,GSFS 得分往往与序列相似性得分相关,并且具有高 GSFS 得分的复合物往往参与相同的功能目录。与成对方法和注释方法相比,GSFS 具有更好的区分能力,并且更准确地反映了复合物之间已知的功能目录。通过比较来自不同微阵列平台的前列腺肿瘤样本的差异表达基因,并确定冠心病易感性途径的案例研究表明,该方法可以为未来探索复杂疾病的分子基础的研究做出贡献。