Suppr超能文献

基于序列的人类基因组中i-基序候选序列的优先级排序。

Sequence-based prioritization of i-Motif candidates in the human genome.

作者信息

Remori Veronica, Prest Michela, Fasano Mauro

机构信息

Department of Science and High Technology, University of Insubria, Como, Italy.

Center of Neuroscience Research, University of Insubria, Busto Arsizio, Italy.

出版信息

Front Bioinform. 2025 Aug 12;5:1657841. doi: 10.3389/fbinf.2025.1657841. eCollection 2025.

Abstract

INTRODUCTION

i-Motifs (iMs) are cytosine-rich, four-stranded DNA structures with emerging roles in gene regulation and genome stability. Despite their biological relevance, genome-wide prediction of iM-forming sequences remains limited by low specificity and high false-positive rates, leading to considerable experimental burden.

METHOD

To address this, we developed a refined computational approach that prioritizes high-confidence iM candidates using a Position-Specific Similarity Matrix (PSSM) derived from multiple sequence alignments. The human reference genome (hg38) was scanned using a custom regular expression targeting cytosine-rich motifs, followed by scoring each sequence with the PSSM. Statistical significance was assessed via permutation testing, one-sided t-tests, Benjamini-Hochberg correction, and Z-scores.

RESULTS

This pipeline identified 37,075 candidate sequences (15-46 nucleotides) with strong iM-forming potential. Validation against experimentally confirmed iMs and known G-quadruplexes (G4s) demonstrated significant differences in alignment scores and sequence similarity, confirming structural specificity. A random forest classifier trained on nucleotide features further supported the distinctiveness of the candidates, achieving a high classification performance.

CONCLUSION

This work presents a scalable and statistically robust method to enrich for biologically relevant iM sequences, providing a valuable resource for future experimental validation and the rational design of ligands targeting iMs to modulate gene expression in contexts such as cancer.

摘要

引言

i-基序(iMs)是富含胞嘧啶的四链DNA结构,在基因调控和基因组稳定性中发挥着越来越重要的作用。尽管它们具有生物学相关性,但全基因组范围内形成i-基序序列的预测仍受到低特异性和高假阳性率的限制,导致实验负担相当大。

方法

为了解决这个问题,我们开发了一种改进的计算方法,该方法使用从多序列比对中得出的位置特异性相似性矩阵(PSSM)对高可信度的i-基序候选序列进行优先级排序。使用针对富含胞嘧啶基序的自定义正则表达式扫描人类参考基因组(hg38),然后用PSSM对每个序列进行评分。通过置换检验、单侧t检验、Benjamini-Hochberg校正和Z分数评估统计显著性。

结果

该流程鉴定出37,075个具有强大i-基序形成潜力的候选序列(15 - 46个核苷酸)。与经实验确认的i-基序和已知的G-四链体(G4s)进行验证,结果表明在比对分数和序列相似性方面存在显著差异,证实了结构特异性。基于核苷酸特征训练的随机森林分类器进一步支持了候选序列的独特性,实现了较高的分类性能。

结论

这项工作提出了一种可扩展且统计稳健的方法来富集生物学相关的i-基序序列,为未来的实验验证以及在癌症等背景下合理设计靶向i-基序以调节基因表达的配体提供了宝贵资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/428a/12378704/d2bc79d2f4b5/fbinf-05-1657841-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验