Bionformatics and Genomics Department, Centro de Investigación Príncipe Felipe , Valencia 46013, Spain.
BMC Bioinformatics. 2010 Nov 8;11:551. doi: 10.1186/1471-2105-11-551.
Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty.
We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed.
The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven.
转录因子(TFs)通过与称为转录因子结合位点(TFBS)的 DNA 特定区域结合来控制转录。TFBS 的识别是计算生物学中的一个关键问题,包括预测给定 DNA 序列中已知 TFBS 基序位置的子任务。先前已经表明,在对已知 TFBS 基序进行评分匹配时,应该考虑基序内位置之间的相关性。然而,由于与已知 TFBS 相似的序列可能以相对较高的频率偶然出现,因此这仍然是一项具有挑战性的任务。在这里,我们提出了一种基于直觉模糊集(IFS)理论的新方法来匹配序列与 TFBS 基序,该方法已被证明特别适合解决体现高度不确定性的问题。
我们提出了一种新的评分方法 SCintuit,用于基于 IFS 理论测量序列基序亲和力。与考虑位置相关性的现有方法不同,SCintuit 的设计旨在防止高估 TFBS 较少保守的位置。对于给定的碱基对,SCintuit 的计算不仅取决于它们的组合出现概率,还考虑了每个碱基在其对应位置的个体重要性。我们使用 SCintuit 在 DNA 序列中识别已知的 TFBS。我们的方法在处理合成和真实数据时都能提供出色的结果,在我们进行的所有实验中,其敏感性和特异性均优于两种现有方法。
结果表明,SCintuit 提高了现有方法的 TF 预测质量,而不会影响敏感性。此外,我们展示了 SCintuit 如何成功应用于实际研究问题。在这项研究中,证明了 IFS 理论在基序发现任务中的可靠性。