Ruess Holly, Lee Jeon, Guzman Carlos, Malladi Venkat S, D'Orso Iván
Lyda Hill Department of Bioinformatics, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
Department of Microbiology, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
Bioinform Biol Insights. 2022 Feb 26;16:11779322211072333. doi: 10.1177/11779322211072333. eCollection 2022.
Fundamental principles of HIV-1 integration into the human genome have been revealed in the past 2 decades. However, the impact of the integration site on proviral transcription and expression remains poorly understood. Solving this problem requires the analysis of multiple genomic datasets for thousands of proviral integration sites. Here, we generated and combined large-scale datasets, including epigenetics, transcriptome, and 3-dimensional genome architecture to interrogate the chromatin states, transcription activity, and nuclear sub-compartments around HIV-1 integrations in Jurkat CD4 T cells to decipher human genome regulatory features shaping the transcription of proviral classes based on their position and orientation in the genome. Through a Hidden Markov Model and ranked informative values prior to a machine learning logistic regression model, we defined nuclear sub-compartments and chromatin states contributing to genomic architecture, transcriptional activity, and nucleosome density of regions neighboring the integration site, as additive features influencing HIV-1 expression. Our integrated genomics approach also allows for a robust experimental design, in which HIV-1 can be genetically introduced into precise genomic locations with known regulatory features to assess the relationship of integration positions to viral transcription and fate.
在过去20年里,已揭示了HIV-1整合入人类基因组的基本原理。然而,整合位点对原病毒转录和表达的影响仍知之甚少。解决这个问题需要分析数千个原病毒整合位点的多个基因组数据集。在这里,我们生成并整合了大规模数据集,包括表观遗传学、转录组和三维基因组结构,以研究Jurkat CD4 T细胞中HIV-1整合周围的染色质状态、转录活性和核亚区室,从而破译基于原病毒在基因组中的位置和方向塑造其转录的人类基因组调控特征。通过隐马尔可夫模型和在机器学习逻辑回归模型之前对信息值进行排序,我们定义了有助于整合位点邻近区域的基因组结构、转录活性和核小体密度的核亚区室和染色质状态,作为影响HIV-1表达的附加特征。我们的综合基因组学方法还允许进行稳健的实验设计,其中可以将HIV-1基因导入具有已知调控特征的精确基因组位置,以评估整合位置与病毒转录和命运的关系。