The Second Hospital of Jilin University, Changchun, Jilin Province, China.
Appl Biochem Biotechnol. 2023 Oct;195(10):6020-6031. doi: 10.1007/s12010-023-04322-2. Epub 2023 Feb 10.
The study of protein-coding gene structure and protein-related genes in kidney stone disease is used for accurate identification of splicing sites and accurate location of gene exon boundaries, which is one of the difficulties and key problems in understanding the genome and discovering new genes. Prediction techniques based on signal characteristics of conserved sequences around splicing sites, such as the weighted array model (WAM), are widely used. On this basis, several other features that can be used for splicing site recognition (such as the base composition of splicing site upstream and downstream sequences, the change of signal and base composition of upstream and downstream sequences with the C + G content of adjacent sequences) were mined further, and a model was developed to describe these features. In this study, a log-linear model that can effectively integrate these features for splicing site recognition was designed, and a SpliceKey programme was developed. The findings reveal that SpliceKey's splicing site identification accuracy is not only much better than the WAM approach, but also better than DGSplice.
对肾结石疾病中蛋白质编码基因结构和蛋白质相关基因的研究,用于准确识别剪接位点和准确定位基因外显子边界,这是理解基因组和发现新基因的难点和关键问题之一。基于剪接位点周围保守序列信号特征的预测技术,如加权数组模型(WAM),被广泛应用。在此基础上,进一步挖掘了其他一些可用于剪接位点识别的特征(如剪接位点上下游序列的碱基组成、上下游序列信号和碱基组成随相邻序列 C+G 含量的变化),并建立了一个模型来描述这些特征。在这项研究中,设计了一个可以有效地整合这些特征用于剪接位点识别的对数线性模型,并开发了一个名为 SpliceKey 的程序。研究结果表明,SpliceKey 的剪接位点识别准确性不仅明显优于 WAM 方法,而且也优于 DGSplice。