School of Computer Science and Engineering, Soongsil University, Seoul, Korea.
BMB Rep. 2013 Jan;46(1):41-6. doi: 10.5483/bmbrep.2013.46.1.159.
Identifying genes indispensable for an organism's life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, protein-protein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature's relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods.
鉴定生物生存所必需的基因及其特征是当前生物学研究的核心问题之一,因此开发用于预测必需基因的计算方法将很有帮助。预测器的性能通常通过接收者操作特征曲线下的面积(AUC)来衡量。我们提出了一种新方法,通过实施遗传算法来最大化受限于特定低假阳性率(FPR)区间的部分 AUC,该区间与后续实验验证相关。我们的预测器使用基于序列信息、蛋白质-蛋白质相互作用网络拓扑和基因表达谱的各种特征。开发了特征选择包装器来缓解过拟合问题并权衡每个特征对预测的相关性。我们使用芽殖酵母的蛋白质组评估了我们的方法。我们实现的遗传算法在 FPR 为 0.05 或 0.10 以下最大化部分 AUC 的方法优于其他流行的分类方法。