• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CAGE标签表达数据的大规模聚类

Large-scale clustering of CAGE tag expression data.

作者信息

Shimokawa Kazuro, Okamura-Oho Yuko, Kurita Takio, Frith Martin C, Kawai Jun, Carninci Piero, Hayashizaki Yoshihide

机构信息

Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, Japan.

出版信息

BMC Bioinformatics. 2007 May 21;8:161. doi: 10.1186/1471-2105-8-161.

DOI:10.1186/1471-2105-8-161
PMID:17517134
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1890301/
Abstract

BACKGROUND

Recent analyses have suggested that many genes possess multiple transcription start sites (TSSs) that are differentially utilized in different tissues and cell lines. We have identified a huge number of TSSs mapped onto the mouse genome using the cap analysis of gene expression (CAGE) method. The standard hierarchical clustering algorithm, which gives us easily understandable graphical tree images, has difficulties in processing such huge amounts of TSS data and a better method to calculate and display the results is needed.

RESULTS

We use a combination of hierarchical and non-hierarchical clustering to cluster expression profiles of TSSs based on a large amount of CAGE data to profit from the best of both methods. We processed the genome-wide expression data, including 159,075 TSSs derived from 127 RNA samples of various organs of mouse, and succeeded in categorizing them into 70-100 clusters. The clusters exhibited intriguing biological features: a cluster supergroup with a ubiquitous expression profile, tissue-specific patterns, a distinct distribution of non-coding RNA and functional TSS groups.

CONCLUSION

Our approach succeeded in greatly reducing the calculation cost, and is an appropriate solution for analyzing large-scale TSS usage data.

摘要

背景

最近的分析表明,许多基因拥有多个转录起始位点(TSS),这些位点在不同组织和细胞系中被差异利用。我们使用基因表达的帽分析(CAGE)方法,在小鼠基因组上鉴定出了大量的TSS。标准的层次聚类算法虽然能给我们提供易于理解的树形图图像,但在处理如此大量的TSS数据时存在困难,因此需要一种更好的方法来计算和展示结果。

结果

我们结合层次聚类和非层次聚类,基于大量CAGE数据对TSS的表达谱进行聚类,以充分利用两种方法的优点。我们处理了全基因组表达数据,其中包括来自小鼠各种器官的127个RNA样本中的159,075个TSS,并成功将它们分类为70 - 100个簇。这些簇呈现出有趣的生物学特征:一个具有普遍表达谱的簇超群、组织特异性模式、非编码RNA的独特分布以及功能性TSS组。

结论

我们的方法成功地大幅降低了计算成本,是分析大规模TSS使用数据的合适解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/87be1b4a8d17/1471-2105-8-161-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/c7cf0c2a74e6/1471-2105-8-161-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/567993abd369/1471-2105-8-161-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/4c7e10b8b5e3/1471-2105-8-161-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/d98c3441aaff/1471-2105-8-161-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/87be1b4a8d17/1471-2105-8-161-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/c7cf0c2a74e6/1471-2105-8-161-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/567993abd369/1471-2105-8-161-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/4c7e10b8b5e3/1471-2105-8-161-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/d98c3441aaff/1471-2105-8-161-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ad/1890301/87be1b4a8d17/1471-2105-8-161-5.jpg

相似文献

1
Large-scale clustering of CAGE tag expression data.CAGE标签表达数据的大规模聚类
BMC Bioinformatics. 2007 May 21;8:161. doi: 10.1186/1471-2105-8-161.
2
Clustering and re-clustering for pattern discovery in gene expression data.用于基因表达数据中模式发现的聚类和再聚类。
J Bioinform Comput Biol. 2005 Apr;3(2):281-301. doi: 10.1142/s0219720005001053.
3
Finding large domains of similarly expressed genes. A novel method using the MDL principle and the recursive segmentation procedure.寻找相似表达基因的大区域。一种使用最小描述长度(MDL)原则和递归分割程序的新方法。
IEEE Eng Med Biol Mag. 2006 Jan-Feb;25(1):82-9. doi: 10.1109/memb.2006.1578667.
4
A rapid method for computationally inferring transcriptome coverage and microarray sensitivity.一种用于计算推断转录组覆盖率和微阵列灵敏度的快速方法。
Bioinformatics. 2005 Jan 1;21(1):80-9. doi: 10.1093/bioinformatics/bth472. Epub 2004 Aug 12.
5
Global identification of transcription start sites in the genome of Apis mellifera using 5'LongSAGE.利用 5'LongSAGE 在蜜蜂基因组中进行转录起始位点的全局鉴定。
J Exp Zool B Mol Dev Evol. 2011 Nov 15;316(7):500-14. doi: 10.1002/jez.b.21421. Epub 2011 Jun 21.
6
Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and ChIP-chip data.通过联合建模基因表达和芯片数据发现转录模块的贝叶斯层次模型。
BMC Bioinformatics. 2007 Aug 3;8:283. doi: 10.1186/1471-2105-8-283.
7
Data-fusion in clustering microarray data: balancing discovery and interpretability.聚类微阵列数据中的数据融合:平衡发现和可解释性。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):50-63. doi: 10.1109/TCBB.2007.70267.
8
Integration of cap analysis of gene expression and chromatin immunoprecipitation analysis on array reveals genome-wide androgen receptor signaling in prostate cancer cells.整合基因表达的帽分析和染色质免疫沉淀分析阵列揭示了前列腺癌细胞中的全基因组雄激素受体信号。
Oncogene. 2011 Feb 3;30(5):619-30. doi: 10.1038/onc.2010.436. Epub 2010 Oct 4.
9
A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE.一种针对多重映射短序列标签的拯救策略优化了通过CAGE进行的转录活性检测。
Genomics. 2008 Mar;91(3):281-8. doi: 10.1016/j.ygeno.2007.11.003.
10
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

引用本文的文献

1
Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level.转录起始位点信号分析可改善基因座水平上转座元件RNA表达分析。
Front Genet. 2022 Oct 21;13:1026847. doi: 10.3389/fgene.2022.1026847. eCollection 2022.
2
A two-stream convolutional neural network for microRNA transcription start site feature integration and identification.用于微小RNA转录起始位点特征整合与识别的双流卷积神经网络。
Sci Rep. 2021 Mar 11;11(1):5625. doi: 10.1038/s41598-021-85173-x.
3
CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses.

本文引用的文献

1
Genome-wide analysis of mammalian promoter architecture and evolution.哺乳动物启动子结构与进化的全基因组分析。
Nat Genet. 2006 Jun;38(6):626-35. doi: 10.1038/ng1789. Epub 2006 Apr 28.
2
The transcriptional landscape of the mammalian genome.哺乳动物基因组的转录图谱。
Science. 2005 Sep 2;309(5740):1559-63. doi: 10.1126/science.1112014.
3
Tight clustering: a resampling-based approach for identifying stable and tight patterns in data.紧密聚类:一种基于重采样的方法,用于识别数据中的稳定且紧密的模式。
CAGEr:用于综合分析的精确转录起始位点数据检索和高分辨率启动子组挖掘
Nucleic Acids Res. 2015 Apr 30;43(8):e51. doi: 10.1093/nar/gkv054. Epub 2015 Feb 4.
4
Diversity of core promoter elements comprising human bidirectional promoters.构成人类双向启动子的核心启动子元件的多样性。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2164-9-S2-S3.
Biometrics. 2005 Mar;61(1):10-6. doi: 10.1111/j.0006-341X.2005.031032.x.
4
Construction of representative transcript and protein sets of human, mouse, and rat as a platform for their transcriptome and proteome analysis.构建人、小鼠和大鼠的代表性转录本和蛋白质集,作为其转录组和蛋白质组分析的平台。
Genomics. 2004 Dec;84(6):913-21. doi: 10.1016/j.ygeno.2004.08.011.
5
Clustering analysis of SAGE data using a Poisson approach.使用泊松方法对SAGE数据进行聚类分析。
Genome Biol. 2004;5(7):R51. doi: 10.1186/gb-2004-5-7-r51. Epub 2004 Jun 29.
6
Absolute expression values for mouse transcripts: re-annotation of the READ expression database by the use of CAGE and EST sequence tags.小鼠转录本的绝对表达值:利用CAGE和EST序列标签对READ表达数据库进行重新注释。
FEBS Lett. 2004 Feb 13;559(1-3):22-6. doi: 10.1016/S0014-5793(04)00018-3.
7
Open source clustering software.开源聚类软件。
Bioinformatics. 2004 Jun 12;20(9):1453-4. doi: 10.1093/bioinformatics/bth078. Epub 2004 Feb 10.
8
Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage.用于转录起始点高通量分析和启动子使用情况鉴定的帽分析基因表达技术。
Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15776-81. doi: 10.1073/pnas.2136655100. Epub 2003 Dec 8.
9
CDS annotation in full-length cDNA sequence.全长cDNA序列中的CDS注释
Genome Res. 2003 Jun;13(6B):1478-87. doi: 10.1101/gr.1060303.
10
How well do we understand the clusters found in microarray data?我们对在微阵列数据中发现的聚类了解多少?
In Silico Biol. 2002;2(4):511-22.