Biozentrum, Julius-Maximilians-Universität, Würzburg 97074, Germany.
Max-Delbrück-Centrum für Molekulare Medizin (MDC), Helmholtz-Gemeinschaft, Berlin 13125, Germany.
Bioinformatics. 2022 Sep 2;38(17):4162-4171. doi: 10.1093/bioinformatics/btac488.
A recent approach to perform genetic tracing of complex biological problems involves the generation of synthetic deoxyribonucleic acid (DNA) probes that specifically mark cells with a phenotype of interest. These synthetic locus control regions (sLCRs), in turn, drive the expression of a reporter gene, such as fluorescent protein. To build functional and specific sLCRs, it is critical to accurately select multiple bona fide cis-regulatory elements from the target cell phenotype cistrome. This selection occurs by maximizing the number and diversity of transcription factors (TFs) within the sLCR, yet the size of the final sLCR should remain limited.
In this work, we discuss how optimization, in particular integer programing, can be used to systematically address the construction of a specific sLCR and optimize pre-defined properties of the sLCR. Our presented instance of a linear optimization problem maximizes the activation potential of the sLCR such that its size is limited to a pre-defined length and a minimum number of all TFs deemed sufficiently characteristic for the phenotype of interest is covered. We generated an sLCR to trace the mesenchymal glioblastoma program in patients by solving our corresponding linear program with the software optimizer Gurobi. Considering the binding strength of transcription factor binding sites (TFBSs) with their TFs as a proxy for activation potential, the optimized sLCR scores similarly to an sLCR experimentally validated in vivo, and is smaller in size while having the same coverage of TFBSs.
We provide a Python implementation of the presented framework in the Supplementary Material with which an optimal selection of cis-regulatory elements can be calculated once the target set of TFs and their binding strength with their TFBSs is known.
Supplementary data are available at Bioinformatics online.
最近一种用于对复杂生物学问题进行遗传追踪的方法涉及生成合成脱氧核糖核酸(DNA)探针,这些探针特异性标记具有感兴趣表型的细胞。这些合成基因调控区(sLCR)继而驱动报告基因的表达,如荧光蛋白。为了构建功能性和特异性的 sLCR,从目标细胞表型染色质中准确选择多个真正的顺式调控元件至关重要。这种选择是通过最大限度地增加 sLCR 内转录因子(TF)的数量和多样性来实现的,但最终 sLCR 的大小应保持有限。
在这项工作中,我们讨论了如何通过优化,特别是整数编程,来系统地解决特定 sLCR 的构建问题,并优化 sLCR 的预定义属性。我们提出的线性优化问题的实例最大限度地提高了 sLCR 的激活潜力,使其大小限制在预定义的长度内,并且覆盖了足够数量的认为与感兴趣表型特征足够相关的所有 TF。我们通过使用软件优化器 Gurobi 求解相应的线性规划,生成了一个 sLCR,以追踪患者间充质胶质母细胞瘤程序。考虑到转录因子结合位点(TFBS)与其 TF 的结合强度作为激活潜力的代理,优化后的 sLCR 与体内实验验证的 sLCR 评分相似,且尺寸更小,同时具有相同的 TFBS 覆盖率。
我们在补充材料中提供了所提出框架的 Python 实现,一旦知道目标 TF 及其与 TFBS 结合强度的集合,就可以计算出顺式调控元件的最佳选择。
补充数据可在生物信息学在线获得。