Suppr超能文献

Singletrome增强了单细胞转录组中长链非编码RNA的检测。

Singletrome enhances detection of long noncoding RNAs in single cell transcriptomes.

作者信息

Rahman Raza Ur, Ahmad Iftikhar, Li Zixiu, Sparks Robert P, Ben Saad Amel, Mullen Alan C

机构信息

Division of Gastroenterology, University of Massachusetts Chan Medical School, Worcester, MA, 01605, USA.

Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA.

出版信息

Sci Rep. 2025 Aug 12;15(1):29542. doi: 10.1038/s41598-025-13528-9.

Abstract

Single cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression in individual cell types, but scRNA-seq studies have focused primarily on expression of protein-coding genes. Long noncoding RNAs (lncRNAs) are more diverse than protein-coding genes, yet remain underexplored in part because they are underrepresented in reference annotations applied to scRNA-seq. Merging annotations containing protein-coding and lncRNA genes is not sufficient, because the addition of lncRNA genes that overlap in sense and antisense with protein-coding genes will affect how reads are counted for both protein-coding and lncRNA genes. Here, we introduce Singletrome, a Singularity image that integrates protein-coding and lncRNA gene transfer format (GTF) annotations to generate enhanced annotations that take into account the sense and antisense overlap of annotated genes, maps scRNA-seq data, and produces files for downstream analysis and visualization. With Singletrome, we detected thousands of lncRNAs not included in GENCODE, clustered cell types based solely on lncRNA expression, and demonstrated that machine learning can predict cell type and disease through lncRNAs alone. This comprehensive annotation will allow mapping of lncRNA expression across cell types of the human body, facilitating the development of an atlas of human lncRNAs in health and disease with the ability to integrate new lncRNA annotations as they become available.

摘要

单细胞RNA测序(scRNA-seq)彻底改变了对单个细胞类型中基因表达的研究,但scRNA-seq研究主要集中在蛋白质编码基因的表达上。长链非编码RNA(lncRNA)比蛋白质编码基因更加多样化,但部分仍未得到充分探索,因为它们在应用于scRNA-seq的参考注释中占比不足。合并包含蛋白质编码和lncRNA基因的注释是不够的,因为与蛋白质编码基因在正义链和反义链上重叠的lncRNA基因的添加会影响蛋白质编码基因和lncRNA基因的读数计数方式。在这里,我们引入了Singletrome,这是一个奇点镜像,它整合了蛋白质编码和lncRNA基因转移格式(GTF)注释,以生成增强注释,该注释考虑了注释基因的正义链和反义链重叠,映射scRNA-seq数据,并生成用于下游分析和可视化的文件。使用Singletrome,我们检测到了数千个未包含在GENCODE中的lncRNA,仅基于lncRNA表达对细胞类型进行聚类,并证明机器学习可以仅通过lncRNA预测细胞类型和疾病。这种全面的注释将允许绘制lncRNA在人体各细胞类型中的表达图谱,有助于开发健康和疾病状态下人类lncRNA图谱,并能够在新的lncRNA注释可用时将其整合进来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5a/12344142/b1b88c114055/41598_2025_13528_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验