Guo Zheng, Zhang Tianwen, Li Xia, Wang Qi, Xu Jianzhen, Yu Hui, Zhu Jing, Wang Haiyun, Wang Chenguang, Topol Eric J, Wang Qing, Rao Shaoqi
Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China.
BMC Bioinformatics. 2005 Mar 17;6:58. doi: 10.1186/1471-2105-6-58.
Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level.
Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s) for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles.
This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge has facilitated the interpretation of the underlying molecular mechanisms for complex human diseases at the modular level.
开发强大而高效的方法来分析和解释高维基因表达谱仍然是计算生物学的一个重点。积累的实验证据支持这样一种假设,即基因在细胞中以模块化方式表达并发挥其功能。因此,开发及时且相关的计算算法存在空间,这些算法利用稳健的功能表达谱在模块化水平上对复杂人类疾病进行精确分类。
受基因作为一个模块执行高度整合的细胞功能这一观点的启发,我们因此定义了一个低维功能表达谱用于数据降维。在将每个单独的基因注释到适当的基因功能分类系统(如本研究中应用的基因本体论)所定义的功能类别后,我们识别出富含差异表达基因的那些功能类别。对于每个功能类别或功能模块,我们计算注释基因原始表达值的汇总度量(s)以捕获该模块的整体活性水平。通过这种方式,我们可以将功能模块内的基因表达视为一个综合数据点,以替代单个基因的多个值。我们使用四个公开可用的数据集比较了基于功能表达谱的决策树与传统基因表达谱的分类性能,这表明使用降维后的功能表达谱可以实现肿瘤类型的精确分类和更好的解释。
这种模块化方法被证明是分析高维微阵列数据的一种强大替代方法,并且对微阵列数据中固有的高测量噪声和内在生物学差异具有鲁棒性。此外,与当前生物学知识的有效整合有助于在模块化水平上解释复杂人类疾病的潜在分子机制。