Hubei Key Lab of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, No. 152 Luoyu Road, Wuhan 430079, China.
Int J Mol Sci. 2021 May 22;22(11):5476. doi: 10.3390/ijms22115476.
Small open reading frames (sORFs) have translational potential to produce peptides that play essential roles in various biological processes. Nevertheless, many sORF-encoded peptides (SEPs) are still on the prediction level. Here, we construct a strategy to analyze SEPs by combining top-down and de novo sequencing to improve SEP identification and sequence coverage. With de novo sequencing, we identified 1682 peptides mapping to 2544 human sORFs, which were all first characterized in this work. Two-thirds of these new sORFs have reading frame shifts and use a non-ATG start codon. The top-down approach identified 241 human SEPs, with high sequence coverage. The average length of the peptides from the bottom-up database search was 19 amino acids (AA); from de novo sequencing, it was 9 AA; and from the top-down approach, it was 25 AA. The longer peptide positively boosts the sequence coverage, more efficiently distinguishing SEPs from the known gene coding sequence. Top-down has the advantage of identifying peptides with sequential K/R or high K/R content, which is unfavorable in the bottom-up approach. Our method can explore new coding sORFs and obtain highly accurate sequences of their SEPs, which can also benefit future function research.
小开放阅读框 (sORFs) 具有翻译潜力,可以产生在各种生物过程中发挥重要作用的肽。然而,许多 sORF 编码肽 (SEPs) 仍处于预测水平。在这里,我们构建了一种通过结合自上而下和从头测序来分析 SEP 的策略,以提高 SEP 的鉴定和序列覆盖度。通过从头测序,我们鉴定了 1682 个映射到 2544 个人类 sORFs 的肽,这些 sORFs 在这项工作中均首次得到了描述。这些新 sORFs 中有三分之二存在读框移位,并使用非 ATG 起始密码子。自上而下的方法鉴定了 241 个人类 SEPs,具有较高的序列覆盖度。从头测序数据库搜索得到的肽的平均长度为 19 个氨基酸 (AA);从从头测序得到的肽的平均长度为 9 AA;从自上而下的方法得到的肽的平均长度为 25 AA。较长的肽可以提高序列覆盖度,更有效地将 SEP 与已知的基因编码序列区分开来。自上而下的方法具有鉴定具有连续 K/R 或高 K/R 含量的肽的优势,而这在从头测序方法中是不利的。我们的方法可以探索新的编码 sORFs,并获得其 SEPs 的高度准确序列,这也将有益于未来的功能研究。