Suppr超能文献

基于遗传算法的必需基因预测的部分 AUC 最大化。

Partial AUC maximization for essential gene prediction using genetic algorithms.

机构信息

School of Computer Science and Engineering, Soongsil University, Seoul, Korea.

出版信息

BMB Rep. 2013 Jan;46(1):41-6. doi: 10.5483/bmbrep.2013.46.1.159.

Abstract

Identifying genes indispensable for an organism's life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, protein-protein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature's relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods.

摘要

鉴定生物生存所必需的基因及其特征是当前生物学研究的核心问题之一,因此开发用于预测必需基因的计算方法将很有帮助。预测器的性能通常通过接收者操作特征曲线下的面积(AUC)来衡量。我们提出了一种新方法,通过实施遗传算法来最大化受限于特定低假阳性率(FPR)区间的部分 AUC,该区间与后续实验验证相关。我们的预测器使用基于序列信息、蛋白质-蛋白质相互作用网络拓扑和基因表达谱的各种特征。开发了特征选择包装器来缓解过拟合问题并权衡每个特征对预测的相关性。我们使用芽殖酵母的蛋白质组评估了我们的方法。我们实现的遗传算法在 FPR 为 0.05 或 0.10 以下最大化部分 AUC 的方法优于其他流行的分类方法。

相似文献

1
Partial AUC maximization for essential gene prediction using genetic algorithms.
BMB Rep. 2013 Jan;46(1):41-6. doi: 10.5483/bmbrep.2013.46.1.159.
2
An extension of the receiver operating characteristic curve and AUC-optimal classification.
Neural Comput. 2012 Oct;24(10):2789-824. doi: 10.1162/NECO_a_00336. Epub 2012 Jun 26.
3
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):50. doi: 10.1186/s12859-017-1468-4.
4
Marker selection via maximizing the partial area under the ROC curve of linear risk scores.
Biostatistics. 2011 Apr;12(2):369-85. doi: 10.1093/biostatistics/kxq052. Epub 2010 Aug 20.
5
Identifying Bacterial Essential Genes Based on a Feature-Integrated Method.
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1274-1279. doi: 10.1109/TCBB.2017.2669968. Epub 2017 Feb 15.
7
Minimalist ensemble algorithms for genome-wide protein localization prediction.
BMC Bioinformatics. 2012 Jul 3;13:157. doi: 10.1186/1471-2105-13-157.
8
A novel feature selection approach for biomedical data classification.
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
9
A comparative study on feature selection for a risk prediction model for colorectal cancer.
Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.
10
Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS.
Integr Biol (Camb). 2014 Apr;6(4):460-9. doi: 10.1039/c3ib40241j. Epub 2014 Mar 7.

引用本文的文献

1
Combining biomarkers by maximizing the true positive rate for a fixed false positive rate.
Biom J. 2021 Aug;63(6):1223-1240. doi: 10.1002/bimj.202000210. Epub 2021 Apr 19.
2
An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm.
BMC Bioinformatics. 2017 Oct 24;18(1):460. doi: 10.1186/s12859-017-1874-7.
3
Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species.
PLoS One. 2017 Mar 30;12(3):e0174638. doi: 10.1371/journal.pone.0174638. eCollection 2017.

本文引用的文献

1
An extension of the receiver operating characteristic curve and AUC-optimal classification.
Neural Comput. 2012 Oct;24(10):2789-824. doi: 10.1162/NECO_a_00336. Epub 2012 Jun 26.
2
NCBI GEO: archive for functional genomics data sets--10 years on.
Nucleic Acids Res. 2011 Jan;39(Database issue):D1005-10. doi: 10.1093/nar/gkq1184. Epub 2010 Nov 21.
3
Investigating the predictability of essential genes across distantly related organisms using an integrative approach.
Nucleic Acids Res. 2011 Feb;39(3):795-807. doi: 10.1093/nar/gkq784. Epub 2010 Sep 24.
4
Identifying essential genes in bacterial metabolic networks with machine learning methods.
BMC Syst Biol. 2010 May 3;4:56. doi: 10.1186/1752-0509-4-56.
5
Predicting essential genes based on network and sequence analysis.
Mol Biosyst. 2009 Dec;5(12):1672-8. doi: 10.1039/B900611G.
6
DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes.
Nucleic Acids Res. 2009 Jan;37(Database issue):D455-8. doi: 10.1093/nar/gkn858. Epub 2008 Oct 30.
8
Gene essentiality analysis based on DEG, a database of essential genes.
Methods Mol Biol. 2008;416:391-400. doi: 10.1007/978-1-59745-321-9_27.
9
Impact of transcriptional properties on essentiality and evolutionary rate.
Genetics. 2007 Jan;175(1):199-206. doi: 10.1534/genetics.106.066027. Epub 2006 Oct 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验