Yen Ming-Ren, Li Ya-Ru, Cheng Chia-Yi, Wu Ting-Ying, Liu Ming-Jung
Institute of Plant and Microbial Biology, Academia Sinica, Taipei, 115201, Taiwan.
Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan, 711, Taiwan.
Plant Mol Biol. 2025 Aug 1;115(4):102. doi: 10.1007/s11103-025-01632-3.
The recognition of translational initiation sites (TISs) offers complementary insights into identifying genes encoding novel proteins or small peptides. Conventional computational methods primarily identify Ribo-seq-supported TISs and lack the capacity of systematic and global identification of TIS, especially for non-AUG sites in plants. Additionally, these methods are often unsuitable for evaluating the importance of mRNA sequence features for TIS determination. In this study, we present TISCalling, a robust framework that combines machine learning (ML) models and statistical analysis to identify and rank novel TISs across eukaryotes. TISCalling generalized and ranks important features common to multiple plant and mammalian species while identifying kingdom-specific features such as mRNA secondary structures and "G"-nucleotide contents. Furthermore, TISCalling achieved high predictive power for identifying novel viral TISs. Importantly, TISCalling provides prediction scores for putative TIS along plant transcripts, enabling prioritization of those of interest for further validation. We offer TISCalling as a command-line-based package [ https://github.com/yenmr/TISCalling ], capable of generating prediction models and identifying key sequence features. Additionally, we provide web tools [ https://predict.southerngenomics.org/TISCalling/ ] for visualizing pre-computed potential TISs, making it accessible to users without programming experience. The TISCalling framework offers a sequence-aware and interpretable approach for decoding genome sequences and exploring functional proteins in plants and viruses.
翻译起始位点(TISs)的识别为鉴定编码新蛋白质或小肽的基因提供了补充性见解。传统的计算方法主要识别核糖体测序(Ribo-seq)支持的TISs,缺乏对TIS进行系统和全局识别的能力,尤其是对于植物中的非AUG位点。此外,这些方法通常不适用于评估mRNA序列特征对TIS确定的重要性。在本研究中,我们提出了TISCalling,这是一个强大的框架,它结合了机器学习(ML)模型和统计分析,以识别和排列真核生物中的新TISs。TISCalling概括并排列了多种植物和哺乳动物物种共有的重要特征,同时识别了特定于不同生物界的特征,如mRNA二级结构和“G”核苷酸含量。此外,TISCalling在识别新的病毒TISs方面具有很高的预测能力。重要的是,TISCalling为植物转录本上的推定TIS提供预测分数,从而能够对感兴趣的TIS进行优先级排序,以便进一步验证。我们将TISCalling作为一个基于命令行的软件包[https://github.com/yenmr/TISCalling]提供,它能够生成预测模型并识别关键序列特征。此外,我们还提供了网络工具[https://predict.southerngenomics.org/TISCalling/],用于可视化预先计算的潜在TISs,使没有编程经验的用户也能使用。TISCalling框架提供了一种序列感知且可解释的方法,用于解码基因组序列并探索植物和病毒中的功能蛋白。