Ni Pengyu, Wu Siwen, Su Zhengchang
Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
NAR Genom Bioinform. 2023 Sep 22;5(3):lqad085. doi: 10.1093/nargab/lqad085. eCollection 2023 Sep.
Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted -regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
自转录活性调控区测序(STARR-seq)及其变体已被广泛用于表征增强子。然而,据报道,高达87%的STARR-seq峰位于抑制性染色质中,且在测试细胞中无功能。虽然抑制性染色质中的一些STARR-seq峰可能在其他细胞/组织类型中具有活性,但其他一些可能是假阳性。同时,许多活性增强子可能无法通过当前的STARR-seq方法鉴定出来。尽管已经提出了一些方法来减轻使用质粒载体引起的系统误差,但由于当前STARR-seq方法的固有局限性导致的假象仍然普遍存在,其根本原因尚未完全了解。基于人类基因组中预测的调控模块(CRM)和非CRM,以及在一些有STARR-seq数据的人类细胞系/组织中预测的活性CRM和非活性CRM,我们揭示了STARR-seq方法主要变体产生的STARR-seq峰中普遍存在的假阳性和假阴性以及可能的潜在原因。我们的结果将有助于设计改进STARR-seq方法的策略并解释结果。