Chakraborty Sagarika, Ardern Zachary, Aliyu Habibu, Kaster Anne-Kristin
Institute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, Germany.
Wellcome Trust Sanger Institute, Hinxton, Saffron Walden CB10 1RQ, United Kingdom.
Comput Struct Biotechnol J. 2025 Jul 24;27:3565-3578. doi: 10.1016/j.csbj.2025.07.036. eCollection 2025.
Omics technologies have led to the discovery of a vast number of proteins that are expressed but have no functional annotation - so called hypothetical proteins (HPs). Even in the best-studied model organism K-12, over 2 % of the proteome remains uncharacterized. This knowledge gap becomes even worse when looking at microbial dark matter. However, knowing the functions of proteins is crucial for elucidating cellular and metabolic processes and harnessing biotechnological potentials. Here, we employed machine learning to decipher the transcriptional regulatory network of K-12, as well as other tools to assign functions to uncharacterized HPs. We further provide experimental validation of predicted functions for three HP-encoding genes (, and ) as proof of concept, by analyzing growth patterns of deletion mutants compared to the wild type, as well as their transcriptional responses to specific conditions. This study demonstrates that the use of Big Omics Data in combination with Artificial Intelligence and experimental controls is a powerful approach to illuminate functional dark matter.
组学技术已促使人们发现了大量已表达但无功能注释的蛋白质——即所谓的假设蛋白质(HPs)。即使在研究最为透彻的模式生物K-12中,仍有超过2%的蛋白质组未得到表征。当审视微生物暗物质时,这一知识空白变得更加严重。然而,了解蛋白质的功能对于阐明细胞和代谢过程以及挖掘生物技术潜力至关重要。在此,我们运用机器学习来解析K-12的转录调控网络,并使用其他工具为未表征的HPs赋予功能。我们还通过分析缺失突变体与野生型相比的生长模式及其对特定条件的转录反应,进一步对三个编码HP的基因(、和)的预测功能进行了实验验证,以此作为概念验证。这项研究表明,将大数据组学与人工智能及实验对照相结合是阐明功能暗物质的有力方法。