用于基因必需性预测的基于序列的信息论特征。

Sequence-based information-theoretic features for gene essentiality prediction.

作者信息

Nigatu Dawit, Sobetzko Patrick, Yousef Malik, Henkel Werner

机构信息

Transmission Systems Group, Jacobs University Bremen, Campus Ring 1, Bremen, D-28759, Germany.

Philipps-Universität Marburg, LOEWE-Zentrum für Synthetische Mikrobiologie, Hans-Meerwein-Straße, Mehrzweckgebäude, Marburg, 35043, Germany.

出版信息

BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.

DOI:10.1186/s12859-017-1884-5

PMID:29121868

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5679510/

Abstract

BACKGROUND

Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences.

RESULTS

We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84).

CONCLUSIONS

The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.

摘要

背景

鉴定必需基因不仅有助于我们理解细胞生命所需的最小基因集，还有助于识别病原体中的新型药物靶点。在这项工作中，我们提出了一种简单有效的基因必需性预测方法，该方法使用仅从基因序列中导出的信息论特征。

结果

我们开发了一种随机森林分类器，并在15种选定的细菌内部和之间进行了广泛的模型性能评估。在生物体内部预测中，训练集和测试集取自同一生物体，获得的曲线下面积（AUC）分数范围为0.73至0.90，平均为0.84。使用5折交叉验证、成对、留一物种法、留一分类单元法和交叉分类单元法进行的跨生物体预测，平均AUC分数分别为0.88、0.75、0.80、0.82和0.78。为了进一步展示我们的方法在生命其他领域的适用性，我们预测了粟酒裂殖酵母的必需基因，并获得了相似的准确率（AUC 0.84）。

结论

所提出的方法能够简单可靠地鉴定必需基因，而无需在数据库中搜索直系同源物，也不需要诸如网络拓扑和基因表达等进一步的实验数据。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于基因必需性预测的基于序列的信息论特征。

Sequence-based information-theoretic features for gene essentiality prediction.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

用于基因必需性预测的基于序列的信息论特征。

Sequence-based information-theoretic features for gene essentiality prediction.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献