Fesenko Igor, Sahakyan Harutyun, Dhyani Rajat, Shabalina Svetlana A, Storz Gisela, Koonin Eugene V
Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA.
Mol Cell. 2025 Mar 6;85(5):1024-1041.e6. doi: 10.1016/j.molcel.2025.01.025. Epub 2025 Feb 19.
Microproteins encoded by small open reading frames comprise the "dark matter" of proteomes. Although microproteins have been detected in diverse organisms from all three domains of life, many more remain to be identified, and only a few have been functionally characterized. In this comprehensive study of intergenic small open reading frames (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identify 67,297 clusters of ismORFs subject to purifying selection. Expression of tagged Escherichia coli microproteins is detected for 11 of the 16 tested, validating the predictions. Although the ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins, some globular folds, oligomeric structures, and possible interactions with proteins encoded by neighboring genes are predicted. Complete information on the predicted microprotein families, including evidence of transcription and translation, and structure predictions are available as an easily searchable resource for investigation of microprotein functions.
由小开放阅读框编码的微蛋白构成了蛋白质组中的“暗物质”。尽管在生命三域的各种生物体中都检测到了微蛋白,但仍有许多有待鉴定,且只有少数微蛋白的功能得到了表征。在对肠杆菌科5668个细菌基因组中的基因间小开放阅读框(ismORF,15 - 70个密码子)进行的这项全面研究中,我们鉴定出67297个受纯化选择的ismORF簇。在16个测试的大肠杆菌微蛋白中,有11个检测到了带标签的微蛋白表达,验证了预测结果。尽管ismORF主要编码疏水的、潜在跨膜的、无结构的或结构最少的微蛋白,但预测了一些球状折叠、寡聚结构以及与相邻基因编码的蛋白质可能的相互作用。关于预测的微蛋白家族的完整信息,包括转录和翻译证据以及结构预测,可作为一个易于搜索的资源,用于研究微蛋白的功能。