Hu Fengyuan, Lu Jia, Matheson Louise S, Díaz-Muñoz Manuel D, Saveliev Alexander, Xu Jinbo, Turner Martin
Laboratory of Lymphocyte Signalling and Development, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK.
Bioinformatics. 2021 Oct 11;37(19):3152-3159. doi: 10.1093/bioinformatics/btab339.
The annotation of small open reading frames (smORFs) of <100 codons (<300 nucleotides) is challenging due to the large number of such sequences in the genome.
In this study, we developed a computational pipeline, which we have named ORFLine, that stringently identifies smORFs and classifies them according to their position within transcripts. We identified a total of 5744 unique smORFs in datasets from mouse B and T lymphocytes and systematically characterized them using ORFLine. We further searched smORFs for the presence of a signal peptide, which predicted known secreted chemokines as well as novel micropeptides. Four novel micropeptides show evidence of secretion and are therefore candidate mediators of immunoregulatory functions.
Freely available on the web at https://github.com/boboppie/ORFLine.
Supplementary data are available at Bioinformatics online.
由于基因组中此类序列数量众多,对小于100个密码子(<300个核苷酸)的小开放阅读框(smORF)进行注释具有挑战性。
在本研究中,我们开发了一种计算流程,将其命名为ORFLine,它能严格识别smORF,并根据它们在转录本中的位置对其进行分类。我们在小鼠B淋巴细胞和T淋巴细胞的数据集中总共鉴定出5744个独特的smORF,并使用ORFLine对它们进行了系统表征。我们进一步在smORF中搜索信号肽的存在,这预测了已知的分泌趋化因子以及新型微肽。四种新型微肽显示出分泌证据,因此是免疫调节功能的候选介质。
可在https://github.com/boboppie/ORFLine上免费在线获取。
补充数据可在《生物信息学》在线获取。