Beder Thomas, Aromolaran Olufemi, Dönitz Jürgen, Tapanelli Sofia, Adedeji Eunice O, Adebiyi Ezekiel, Bucher Gregor, Koenig Rainer
Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany.
Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria.
NAR Genom Bioinform. 2021 Nov 30;3(4):lqab110. doi: 10.1093/nargab/lqab110. eCollection 2021 Dec.
Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to and and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.
在全基因组范围内鉴定必需基因需要大量资源,并且仅在少数真核生物中进行过。对于研究较少的生物,必需性可能通过基因同源性来预测。然而,这种方法不适用于非保守基因。此外,从单细胞或整个多细胞生物的研究中获得了不同的必需性信息,特别是当这些信息来自人类细胞系筛选和人类群体研究时。我们对六种模式真核生物和60381个基因采用了机器学习,使用了从序列、基因功能信息和网络拓扑结构中提取的41635个特征。在留一生物交叉验证中,分类器显示出很高的通用性,在留出的物种中平均准确率接近80%。作为一个案例研究,我们将该方法应用于[具体内容缺失]并通过实验验证了预测结果,得到了相似的性能。最后,使用基于所研究模式生物的分类器能够将人类细胞系筛选和群体研究的必需性信息联系起来。