School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Genomics Proteomics Bioinformatics. 2022 Oct;20(5):1013-1027. doi: 10.1016/j.gpb.2022.03.001. Epub 2022 May 11.
Gene Ontology (GO) has been widely used to annotate functions of genes and gene products. Here, we proposed a new method, TripletGO, to deduce GO terms of protein-coding and non-coding genes, through the integration of four complementary pipelines built on transcript expression profile, genetic sequence alignment, protein sequence alignment, and naïve probability. TripletGO was tested on a large set of 5754 genes from 8 species (human, mouse, Arabidopsis, rat, fly, budding yeast, fission yeast, and nematoda) and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge (CAFA3). Experimental results show that TripletGO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches. Detailed analyses show that the major advantage of TripletGO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique, which can accurately recognize function patterns from transcript expression profiles. Meanwhile, the combination of multiple complementary models, especially those from transcript expression and protein-level alignments, improves the coverage and accuracy of the final GO annotation results. The standalone package and an online server of TripletGO are freely available at https://zhanggroup.org/TripletGO/.
GO 已被广泛用于注释基因和基因产物的功能。在这里,我们提出了一种新的方法,TripletGO,通过整合基于转录表达谱、遗传序列比对、蛋白质序列比对和朴素概率的四个互补管道,来推断蛋白质编码和非编码基因的 GO 术语。TripletGO 在来自 8 个物种(人类、小鼠、拟南芥、大鼠、果蝇、酿酒酵母、裂殖酵母和线虫)的 5754 个基因和 2433 个具有可用表达数据的蛋白质上进行了测试,这些数据来自第三届蛋白质功能注释评估挑战赛(CAFA3)。实验结果表明,TripletGO 实现了功能注释准确性的显著提高,超过了当前最先进的方法。详细分析表明,TripletGO 的主要优势在于将基于新的三元网络的分析方法与特征空间映射技术相结合,可以从转录表达谱中准确识别功能模式。同时,多种互补模型的组合,特别是来自转录表达和蛋白质水平比对的模型,提高了最终 GO 注释结果的覆盖度和准确性。TripletGO 的独立软件包和在线服务器可在 https://zhanggroup.org/TripletGO/ 免费获得。