• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

信息论应用于稀疏基因本体注释网络以预测新的基因功能。

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

作者信息

Tao Ying, Sam Lee, Li Jianrong, Friedman Carol, Lussier Yves A

机构信息

Department of Biomedical Informatics, Columbia University, 622 West 168th Street, VC5, New York, NY 10032, USA.

出版信息

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

DOI:10.1093/bioinformatics/btm195
PMID:17646340
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2882681/
Abstract

MOTIVATION

Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes).

RESULTS

We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11,000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43-58%) can be achieved for the human GO Annotation file dated 2003.

AVAILABILITY

The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

尽管基因注释过程取得了进展,但很大一部分基因产物的功能仍未得到充分表征。此外,针对部分已表征基因功能或过程的新型基因本体论(GO)注释的计算机预测高度依赖于反向遗传学或功能基因组学方法。据我们所知,尚未有预测方法被证明对于注释稀少的GO术语(与少于10个基因相关的术语)具有高度准确性。

结果

我们提出了一种新方法,基于信息论的语义相似性(ITSS),以根据现有的GO注释自动预测基因的分子功能。使用10折交叉验证,我们证明在GO数据集注释密集的部分进行类似条件比较时,ITSS算法获得的预测准确率(精确率97%,召回率77%)与其他机器学习算法相当。该方法能够在GO注释稀少的部分生成高度准确的预测,而之前的算法在此处失败了。因此,我们的技术生成的功能预测比以前的方法多一个数量级。对于最近的GO注释(智人中约1400个GO术语和11000个基因)的注释稀少网络,10折交叉验证表明该算法在召回率为36%时精确率为90%。据我们所知,本文提出了对预测的GO注释的首次历史回滚验证,这可能比更广泛使用的交叉验证方法代表更现实的条件。通过手动评估在历史回滚评估中进行的100个预测的随机样本,我们估计对于2003年的人类GO注释文件,最低精确率可达51%(95%置信区间:43 - 58%)。

可用性

该程序可应要求提供。2005年GO注释数据集的97732个新基因注释的阳性预测及其他补充信息可在http://phenos.bsd.uchicago.edu/ITSS/获取。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。
Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.
2
Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。
Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.
3
The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation.PIPA的开发:一种用于全基因组蛋白质功能注释的集成自动化流程
BMC Bioinformatics. 2008 Jan 25;9:52. doi: 10.1186/1471-2105-9-52.
4
Incorporating functional inter-relationships into protein function prediction algorithms.将功能相互关系纳入蛋白质功能预测算法。
BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.
5
A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
6
How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.如何在基因集富集分析中确定哪些是最相关的过度表达特征。
BMC Bioinformatics. 2007 Sep 11;8:332. doi: 10.1186/1471-2105-8-332.
7
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.通过结合基因本体注释和基因共功能网络来测量语义相似性。
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
8
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.基因本体注释的自动提取及其与蛋白质网络中聚类的相关性。
BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243.
9
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.基因本体注释(GOA)数据库:在UniProt中与基因本体共享知识。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.
10
AVID: an integrative framework for discovering functional relationships among proteins.AVID:一个用于发现蛋白质间功能关系的综合框架。
BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

引用本文的文献

1
Extracellular Nicotinamide Phosphoribosyltransferase Is a Therapeutic Target in Experimental Necrotizing Enterocolitis.细胞外烟酰胺磷酸核糖转移酶是实验性坏死性小肠结肠炎的治疗靶点。
Biomedicines. 2024 Apr 28;12(5):970. doi: 10.3390/biomedicines12050970.
2
Learning sequence, structure, and function representations of proteins with language models.利用语言模型学习蛋白质的序列、结构和功能表示。
bioRxiv. 2023 Nov 26:2023.11.26.568742. doi: 10.1101/2023.11.26.568742.
3
The eNAMPT/TLR4 inflammatory cascade drives the severity of intra-amniotic inflammation in pregnancy and predicts infant outcomes.

本文引用的文献

1
Gene Expression Correlation and Gene Ontology-Based Similarity: An Assessment of Quantitative Relationships.基因表达相关性与基于基因本体论的相似性:定量关系评估
Proc IEEE Symp Comput Intell Bioinforma Comput Biol. 2004 Oct 7;2004:25-31. doi: 10.1109/CIBCB.2004.1393927.
2
Measures of semantic similarity and relatedness in the biomedical domain.生物医学领域中语义相似性和相关性的度量。
J Biomed Inform. 2007 Jun;40(3):288-99. doi: 10.1016/j.jbi.2006.06.004. Epub 2006 Jun 10.
3
Gene prioritization through genomic data fusion.通过基因组数据融合进行基因优先级排序。
eNAMPT/TLR4炎症级联反应驱动孕期羊膜内炎症的严重程度,并预测婴儿预后。
Front Physiol. 2023 Jun 20;14:1129413. doi: 10.3389/fphys.2023.1129413. eCollection 2023.
4
Epithelial cell responses to rhinovirus identify an early-life-onset asthma phenotype in adults.呼吸道合胞病毒感染诱导的上皮细胞反应可鉴定成人早发型哮喘表型。
J Allergy Clin Immunol. 2022 Sep;150(3):604-611. doi: 10.1016/j.jaci.2022.03.020. Epub 2022 Mar 31.
5
HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey.HESML:生物医学领域的实时语义度量库,附有可重现的调查。
BMC Bioinformatics. 2022 Jan 6;23(1):23. doi: 10.1186/s12859-021-04539-0.
6
Endothelial eNAMPT drives EndMT and preclinical PH: rescue by an eNAMPT-neutralizing mAb.内皮细胞外烟酰胺磷酸核糖转移酶驱动内皮-间充质转化和临床前肺动脉高压:用一种抗eNAMPT单克隆抗体进行挽救
Pulm Circ. 2021 Nov 12;11(4):20458940211059712. doi: 10.1177/20458940211059712. eCollection 2021 Oct-Dec.
7
'Single-subject studies'-derived analyses unveil altered biomechanisms between very small cohorts: implications for rare diseases.单病例研究衍生分析揭示了非常小的队列之间的生物力学改变:对罕见疾病的影响。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i67-i75. doi: 10.1093/bioinformatics/btab290.
8
Gene function finding through cross-organism ensemble learning.通过跨物种集成学习进行基因功能发现。
BioData Min. 2021 Feb 12;14(1):14. doi: 10.1186/s13040-021-00239-w.
9
Predicting functions of maize proteins using graph convolutional network.利用图卷积网络预测玉米蛋白的功能。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.
10
Rhinovirus Infections in Individuals with Asthma Increase ACE2 Expression and Cytokine Pathways Implicated in COVID-19.哮喘患者的鼻病毒感染会增加与COVID-19相关的血管紧张素转换酶2(ACE2)表达和细胞因子通路。
Am J Respir Crit Care Med. 2020 Sep 1;202(5):753-755. doi: 10.1164/rccm.202004-1343LE.
Nat Biotechnol. 2006 May;24(5):537-44. doi: 10.1038/nbt1203.
4
Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations.酵母蛋白质-蛋白质相互作用网络的预测:来自基因本体论和注释的见解。
Nucleic Acids Res. 2006 Apr 26;34(7):2137-50. doi: 10.1093/nar/gkl219. Print 2006.
5
Microparadigms: chains of collective reasoning in publications about molecular interactions.微观范式:关于分子相互作用的出版物中的集体推理链
Proc Natl Acad Sci U S A. 2006 Mar 28;103(13):4940-5. doi: 10.1073/pnas.0600591103. Epub 2006 Mar 16.
6
Creation and implications of a phenome-genome network.表型组-基因组网络的构建及其意义
Nat Biotechnol. 2006 Jan;24(1):55-62. doi: 10.1038/nbt1150.
7
Improving missing value estimation in microarray data with gene ontology.利用基因本体论改进微阵列数据中的缺失值估计
Bioinformatics. 2006 Mar 1;22(5):566-72. doi: 10.1093/bioinformatics/btk019. Epub 2005 Dec 23.
8
Convergent functional genomics: a Bayesian candidate gene identification approach for complex disorders.收敛性功能基因组学:一种用于复杂疾病的贝叶斯候选基因识别方法。
Methods. 2005 Nov;37(3):274-9. doi: 10.1016/j.ymeth.2005.03.012.
9
Automated methods of predicting the function of biological sequences using GO and BLAST.使用基因本体论(GO)和基本局部比对搜索工具(BLAST)预测生物序列功能的自动化方法。
BMC Bioinformatics. 2005 Nov 15;6:272. doi: 10.1186/1471-2105-6-272.
10
Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks.系统调查揭示了基因共表达网络中“关联有罪”的普遍适用性。
BMC Bioinformatics. 2005 Sep 14;6:227. doi: 10.1186/1471-2105-6-227.