Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA.
Nucleic Acids Res. 2021 Jul 21;49(13):e77. doi: 10.1093/nar/gkab349.
Deep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that discover the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena such as gene expression, remains an open problem. We present SATORI, a Self-ATtentiOn based model to detect Regulatory element Interactions. Our approach combines convolutional layers with a self-attention mechanism that helps us capture a global view of the landscape of interactions between regulatory elements in a sequence. A comprehensive evaluation demonstrates the ability of SATORI to identify numerous statistically significant TF-TF interactions, many of which have been previously reported. Our method is able to detect higher numbers of experimentally verified TF-TF interactions than existing methods, and has the advantage of not requiring a computationally expensive post-processing step. Finally, SATORI can be used for detection of any type of feature interaction in models that use a similar attention mechanism, and is not limited to the detection of TF-TF interactions.
深度学习在模拟基因表达等复杂生物现象方面展现出了强大的预测能力。这些模型的价值不仅取决于其准确性,还取决于从训练模型中提取生物相关信息的能力。尽管最近已经有很多关于开发特征归因方法的工作,这些方法可以发现给定序列中最重要的特征,但推断调控元件之间的协同作用,这是基因表达等现象的标志,仍然是一个未解决的问题。我们提出了 SATORI,这是一种基于自注意力的模型,用于检测调控元件相互作用。我们的方法将卷积层与自注意力机制相结合,帮助我们捕捉到序列中调控元件之间相互作用的全局视图。全面的评估表明,SATORI 能够识别出许多具有统计学意义的 TF-TF 相互作用,其中许多已经被先前的研究报道过。我们的方法能够比现有的方法检测到更多数量的经实验验证的 TF-TF 相互作用,并且具有不需要计算成本高昂的后处理步骤的优势。最后,SATORI 可以用于检测使用类似注意力机制的模型中的任何类型的特征相互作用,并且不限于检测 TF-TF 相互作用。