Terry Fox Laboratory, BC Cancer, Vancouver, BC V5Z1L3, Canada.
Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
G3 (Bethesda). 2022 Jul 29;12(8). doi: 10.1093/g3journal/jkac142.
In the past decade, there has been a growing appreciation for R-loop structures as important regulators of the epigenome, telomere maintenance, DNA repair, and replication. Given these numerous functions, dozens, or potentially hundreds, of proteins could serve as direct or indirect regulators of R-loop writing, reading, and erasing. In order to understand common properties shared amongst potential R-loop binding proteins, we mined published proteomic studies and distilled 10 features that were enriched in R-loop binding proteins compared with the rest of the proteome. Applying an easy-ensemble machine learning approach, we used these R-loop binding protein-specific features along with their amino acid composition to create random forest classifiers that predict the likelihood of a protein to bind to R-loops. Known R-loop regulating pathways such as splicing, DNA damage repair and chromatin remodeling are highly enriched in our datasets, and we validate 2 new R-loop binding proteins LIG1 and FXR1 in human cells. Together these datasets provide a reference to pursue analyses of novel R-loop regulatory proteins.
在过去的十年中,人们越来越意识到 R 环结构是表观基因组、端粒维持、DNA 修复和复制的重要调节剂。鉴于这些众多的功能,数十种甚至可能数百种蛋白质可以作为 R 环书写、阅读和消除的直接或间接调节剂。为了了解潜在的 R 环结合蛋白之间的共同特性,我们挖掘了已发表的蛋白质组学研究,并提取了 10 个特征,这些特征在 R 环结合蛋白中比在蛋白质组的其他部分中更为丰富。我们应用易于集成的机器学习方法,使用这些 R 环结合蛋白特有的特征及其氨基酸组成,创建随机森林分类器,预测蛋白质与 R 环结合的可能性。我们的数据集中高度富集了已知的 R 环调节途径,如剪接、DNA 损伤修复和染色质重塑,我们在人类细胞中验证了 2 个新的 R 环结合蛋白 LIG1 和 FXR1。这些数据集共同为研究新的 R 环调节蛋白提供了参考。