• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于支持向量机的果蝇基因功能预测。

Prediction of Drosophila melanogaster gene function using Support Vector Machines.

机构信息

Toronto Health Economics and Technology Assessment (THETA) Collaborative, University of Toronto, Toronto, Canada.

出版信息

BioData Min. 2013 Apr 2;6(1):8. doi: 10.1186/1756-0381-6-8.

DOI:10.1186/1756-0381-6-8
PMID:23547736
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3669044/
Abstract

BACKGROUND

While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from microarray experiments already exist and many of these can be used to help assign potential functions to these genes. We have applied Support Vector Machines (SVM), a sigmoid fitting function and a stratified cross-validation approach to analyze a large microarray experiment dataset from Drosophila melanogaster in order to predict possible functions for previously un-annotated genes. A total of approximately 5043 different genes, or about one-third of the predicted genes in the D. melanogaster genome, are represented in the dataset and 1854 (or 37%) of these genes are un-annotated.

RESULTS

39 Gene Ontology Biological Process (GO-BP) categories were found with precision value equal or larger than 0.75, when recall was fixed at the 0.4 level. For two of those categories, we have provided additional support for assigning given genes to the category by showing that the majority of transcripts for the genes belonging in a given category have a similar localization pattern during embryogenesis. Additionally, by assessing the predictions using a confidence score, we have been able to provide a putative GO-BP term for 1422 previously un-annotated genes or about 77% of the un-annotated genes represented on the microarray and about 19% of all of the un-annotated genes in the D. melanogaster genome.

CONCLUSIONS

Our study successfully employs a number of SVM classifiers, accompanied by detailed calibration and validation techniques, to generate a number of predictions for new annotations for D. melanogaster genes. The applied probabilistic analysis to SVM output improves the interpretability of the prediction results and the objectivity of the validation procedure.

摘要

背景

尽管已经对数百种生物的基因组进行了测序,并且已经有了很好的方法来寻找蛋白质编码基因,但一个重要的遗留挑战是预测那些没有注释的大量基因的功能。已经存在来自微阵列实验的大型基因表达数据集,并且其中许多可以用于帮助为这些基因赋予潜在的功能。我们已经应用支持向量机 (SVM)、sigmoid 拟合函数和分层交叉验证方法来分析来自黑腹果蝇的大型微阵列实验数据集,以预测以前未注释的基因的可能功能。在数据集中总共代表了大约 5043 个不同的基因,约占黑腹果蝇基因组中预测基因的三分之一,其中 1854 个(或 37%)未注释。

结果

当召回率固定在 0.4 时,发现了 39 个具有等于或大于 0.75 的精度值的基因本体论生物过程 (GO-BP) 类别。对于其中两个类别,我们通过表明属于给定类别的大多数基因的转录物在胚胎发生期间具有相似的定位模式,为将给定基因分配给该类别的提供了额外的支持。此外,通过使用置信度评分评估预测,我们能够为 1422 个以前未注释的基因(约占微阵列上表示的未注释基因的 77%)或约占黑腹果蝇基因组中所有未注释基因的 19%提供一个可能的 GO-BP 术语。

结论

我们的研究成功地采用了许多 SVM 分类器,同时辅以详细的校准和验证技术,为黑腹果蝇基因的新注释生成了许多预测。对 SVM 输出应用概率分析提高了预测结果的可解释性和验证过程的客观性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/97390878e834/1756-0381-6-8-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/e557fea75b3e/1756-0381-6-8-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/80c693c06e97/1756-0381-6-8-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/e8b819f1171d/1756-0381-6-8-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/5b6d7f75cedf/1756-0381-6-8-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/97390878e834/1756-0381-6-8-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/e557fea75b3e/1756-0381-6-8-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/80c693c06e97/1756-0381-6-8-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/e8b819f1171d/1756-0381-6-8-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/5b6d7f75cedf/1756-0381-6-8-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/3669044/97390878e834/1756-0381-6-8-5.jpg

相似文献

1
Prediction of Drosophila melanogaster gene function using Support Vector Machines.基于支持向量机的果蝇基因功能预测。
BioData Min. 2013 Apr 2;6(1):8. doi: 10.1186/1756-0381-6-8.
2
3
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。
Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.
4
Applying Support Vector Machines for Gene Ontology based gene function prediction.应用支持向量机进行基于基因本体论的基因功能预测。
BMC Bioinformatics. 2004 Aug 26;5:116. doi: 10.1186/1471-2105-5-116.
5
Predicting gene ontology from a global meta-analysis of 1-color microarray experiments.从全球 1 色微阵列实验的荟萃分析中预测基因本体。
BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S14. doi: 10.1186/1471-2105-12-S10-S14.
6
Term-tissue specific models for prediction of gene ontology biological processes using transcriptional profiles of aging in drosophila melanogaster.使用黑腹果蝇衰老转录谱预测基因本体生物学过程的组织特异性模型。
BMC Bioinformatics. 2008 Feb 28;9:129. doi: 10.1186/1471-2105-9-129.
7
A genome-wide gene function prediction resource for Drosophila melanogaster.一个用于黑腹果蝇的全基因组基因功能预测资源。
PLoS One. 2010 Aug 12;5(8):e12139. doi: 10.1371/journal.pone.0012139.
8
Structural and functional-annotation of an equine whole genome oligoarray.马全基因组寡核苷酸芯片的结构和功能注释。
BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-10-S11-S8.
9
An iterative approach of protein function prediction.蛋白质功能预测的迭代方法。
BMC Bioinformatics. 2011 Nov 10;12:437. doi: 10.1186/1471-2105-12-437.
10
Functional knowledge transfer for high-accuracy prediction of under-studied biological processes.功能知识转移可实现对研究不足的生物过程的高精度预测。
PLoS Comput Biol. 2013;9(3):e1002957. doi: 10.1371/journal.pcbi.1002957. Epub 2013 Mar 14.

引用本文的文献

1
Gene function finding through cross-organism ensemble learning.通过跨物种集成学习进行基因功能发现。
BioData Min. 2021 Feb 12;14(1):14. doi: 10.1186/s13040-021-00239-w.
2
A Factor Graph Approach to Automated GO Annotation.一种用于自动基因本体注释的因子图方法。
PLoS One. 2016 Jan 15;11(1):e0146986. doi: 10.1371/journal.pone.0146986. eCollection 2016.
3
Putative synaptic genes defined from a Drosophila whole body developmental transcriptome by a machine learning approach.通过机器学习方法从果蝇全身发育转录组中定义的假定突触基因。

本文引用的文献

1
A genome-wide gene function prediction resource for Drosophila melanogaster.一个用于黑腹果蝇的全基因组基因功能预测资源。
PLoS One. 2010 Aug 12;5(8):e12139. doi: 10.1371/journal.pone.0012139.
2
Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function.果蝇中的基因网络:整合实验数据以预测基因功能。
Genome Biol. 2009;10(9):R97. doi: 10.1186/gb-2009-10-9-r97. Epub 2009 Sep 16.
3
FlyBase: enhancing Drosophila Gene Ontology annotations.果蝇数据库:增强果蝇基因本体注释。
BMC Genomics. 2015 Sep 15;16(1):694. doi: 10.1186/s12864-015-1888-3.
4
Using multi-instance hierarchical clustering learning system to predict yeast gene function.使用多实例分层聚类学习系统预测酵母基因功能。
PLoS One. 2014 Mar 12;9(3):e90962. doi: 10.1371/journal.pone.0090962. eCollection 2014.
5
Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold.基于矩阵分解的数据融合用于面包酵母和黏菌中的基因功能预测
Pac Symp Biocomput. 2014:400-11.
Nucleic Acids Res. 2009 Jan;37(Database issue):D555-9. doi: 10.1093/nar/gkn788. Epub 2008 Oct 23.
4
A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.利用整合基因组证据对小家鼠基因功能预测的批判性评估。
Genome Biol. 2008;9 Suppl 1(Suppl 1):S2. doi: 10.1186/gb-2008-9-s1-s2. Epub 2008 Jun 27.
5
Term-tissue specific models for prediction of gene ontology biological processes using transcriptional profiles of aging in drosophila melanogaster.使用黑腹果蝇衰老转录谱预测基因本体生物学过程的组织特异性模型。
BMC Bioinformatics. 2008 Feb 28;9:129. doi: 10.1186/1471-2105-9-129.
6
A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans.一个单基因网络能准确预测秀丽隐杆线虫基因扰动的表型效应。
Nat Genet. 2008 Feb;40(2):181-8. doi: 10.1038/ng.2007.70. Epub 2008 Jan 27.
7
Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function.mRNA定位的全局分析揭示了其在构建细胞结构和功能方面的重要作用。
Cell. 2007 Oct 5;131(1):174-87. doi: 10.1016/j.cell.2007.08.003.
8
An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.一种经过改进、偏差降低的酿酒酵母概率功能基因网络。
PLoS One. 2007 Oct 3;2(10):e988. doi: 10.1371/journal.pone.0000988.
9
Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.结合分类器利用大规模基因表达测量预测拟南芥基因功能。
BMC Bioinformatics. 2007 Sep 21;8:358. doi: 10.1186/1471-2105-8-358.
10
What is a support vector machine?什么是支持向量机?
Nat Biotechnol. 2006 Dec;24(12):1565-7. doi: 10.1038/nbt1206-1565.