Singh G B, Kramer J A, Krawetz S A
Bioinformatics Algorithms Research Division, National Center for Genome Resources, 1800 Old Pecos Trail, Santa Fe, NM 87505, USA.
Nucleic Acids Res. 1997 Apr 1;25(7):1419-25. doi: 10.1093/nar/25.7.1419.
The potentiation and subsequent initiation of transcription are complex biological phenomena. The region of attachment of the chromatin fiber to the nuclear matrix, known as the matrix attachment region or scaffold attachment region (MAR or SAR), are thought to be requisite for the transcriptional regulation of the eukaryotic genome. As expressed sequences should be contained in these regions, it becomes significant to answer the following question: can these regions be identified from the primary sequence data alone and subsequently used as markers for expressed sequences? This paper represents an effort toward achieving this goal and describes a mathematical model for the detection of MARs. The location of matrix associated regions has been linked to a variety of sequence patterns. Consequently, a list of these patterns is compiled and represented as a set of decision rules using an AND-OR formulation. The DNA sequence was then searched for the presence of these patterns and a statistical significance was associated with the frequency of occurrence of the various patterns. Subsequently, a mathematical potential value,MAR-Potential, was assigned to a sequence region as the inverse proportion to the probability that the observed pattern population occurred at random. Such a MAR detection process was applied to the analysis of a variety of known MAR containing sequences. Regions of matrix association predicted by the software essentially correspond to those determined experimentally. The human T-cell receptor and the DNA sequence from the Drosophila bithorax region were also analyzed. This demonstrates the usefulness of the approach described as a means to direct experimental resources.
转录的增强及随后的起始是复杂的生物学现象。染色质纤维与核基质的附着区域,即所谓的基质附着区域或支架附着区域(MAR或SAR),被认为是真核基因组转录调控所必需的。由于表达序列应包含在这些区域中,因此回答以下问题变得很重要:能否仅从一级序列数据中识别出这些区域,并随后将其用作表达序列的标记?本文致力于实现这一目标,并描述了一种用于检测MAR的数学模型。基质相关区域的位置已与多种序列模式相关联。因此,汇编了这些模式的列表,并使用“与-或”公式将其表示为一组决策规则。然后在DNA序列中搜索这些模式的存在,并将统计显著性与各种模式的出现频率相关联。随后,将一个数学潜在值,即MAR-潜力,作为与观察到的模式群体随机出现概率成反比的值赋予一个序列区域。这样的MAR检测过程被应用于分析各种已知的包含MAR的序列。该软件预测的基质关联区域基本上与实验确定的区域相对应。还对人类T细胞受体和果蝇双胸区域的DNA序列进行了分析。这证明了所描述的方法作为一种指导实验资源的手段的有用性。