将长链SAGE标签定位到人类基因组后出现的意外观察结果。

Unexpected observations after mapping LongSAGE tags to the human genome.

作者信息

Keime Céline, Sémon Marie, Mouchiroud Dominique, Duret Laurent, Gandrillon Olivier

机构信息

Université de Lyon, Lyon, France.

出版信息

BMC Bioinformatics. 2007 May 15;8:154. doi: 10.1186/1471-2105-8-154.

DOI:10.1186/1471-2105-8-154

PMID:17504516

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1884178/

Abstract

BACKGROUND

SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts.

RESULTS

Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half) are located either in antisense or in new variants of these known transcripts.

CONCLUSION

We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross-validation of the corresponding tags using other methods.

摘要

背景

基因表达序列分析（SAGE）已被广泛用于研究已知转录本的表达，但用于注释新转录区域的情况则少得多。长SAGE产生的标签足够长，能够可靠地映射到全基因组序列。在此，我们利用这一特性研究了从所有公共文库中获得的人类长SAGE标签的位置。我们主要关注那些不能映射到已知转录本的标签。

结果

利用已发表的SAGE文库错误率，我们首先去除了可能由测序错误产生的标签。然后我们观察到，剩余的标签中仍有出乎意料的大量标签与基因组序列不匹配。其中一些对应于人类mRNA的部分，如多聚腺苷酸尾、两个外显子之间的连接以及转录本的多态性区域。另一个不可忽视的比例可归因于鼠转录本的污染和残留的测序错误。在用这些筛选条件过滤我们的数据以确保我们的数据集高度可靠之后，我们研究了那些在基因组中只映射一次的标签。其中31%的标签对应于未注释的转录本。其他的映射到已知的转录区域，但其中许多（近一半）位于这些已知转录本的反义链或新变体中。

结论

我们对所有公开可用的人类长SAGE标签进行了全面研究，并仔细验证了这些数据的可靠性。我们发现了许多与人类基因组序列不匹配的标签的潜在来源。其余标签的特性表明测序错误率可能被低估了。那些在基因组序列中只映射一次但不在注释外显子中的标签的频率表明，人类转录组比当前人类基因组注释所显示的要复杂得多，存在许多新的剪接变体和反义转录本。正如使用其他方法对相应标签进行的高交叉验证所证明的那样，SAGE数据适合于将新转录本映射到基因组上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5691/1884178/788fa7943fa2/1471-2105-8-154-1.jpg

相似文献

Unexpected observations after mapping LongSAGE tags to the human genome.将长链SAGE标签定位到人类基因组后出现的意外观察结果。

BMC Bioinformatics. 2007 May 15;8:154. doi: 10.1186/1471-2105-8-154.

A large quantity of novel human antisense transcripts detected by LongSAGE.通过长链分析基因表达技术（LongSAGE）检测到大量新的人类反义转录本。

Bioinformatics. 2006 Oct 15;22(20):2475-9. doi: 10.1093/bioinformatics/btl429. Epub 2006 Aug 7.

Transcriptome annotation using tandem SAGE tags.使用串联SAGE标签进行转录组注释。

Nucleic Acids Res. 2007;35(17):e108. doi: 10.1093/nar/gkm495. Epub 2007 Aug 20.

Statistical modeling of sequencing errors in SAGE libraries.SAGE文库中测序错误的统计建模

Bioinformatics. 2004 Aug 4;20 Suppl 1:i31-9. doi: 10.1093/bioinformatics/bth924.

Reverse serial analysis of gene expression (SAGE) characterization of orphan SAGE tags from human embryonic stem cells identifies the presence of novel transcripts and antisense transcription of key pluripotency genes.人类胚胎干细胞中孤儿SAGE标签的反向基因表达序列分析（SAGE）特征鉴定出新型转录本的存在以及关键多能性基因的反义转录。

Stem Cells. 2006 May;24(5):1162-73. doi: 10.1634/stemcells.2005-0304. Epub 2006 Feb 2.

A modified polymerase chain reaction-long serial analysis of gene expression protocol identifies novel transcripts in human CD34+ bone marrow cells.一种改良的聚合酶链反应-基因表达长序列分析方法鉴定出人CD34+骨髓细胞中的新转录本。

Stem Cells. 2007 Jul;25(7):1681-9. doi: 10.1634/stemcells.2006-0794. Epub 2007 Apr 5.

Analysis of SAGE data in human platelets: features of the transcriptome in an anucleate cell.人类血小板中SAGE数据的分析：无核细胞中转录组的特征

Thromb Haemost. 2006 Apr;95(4):643-51.

Deep analysis of cellular transcriptomes - LongSAGE versus classic MPSS.细胞转录组的深度分析——长链SAGE与经典MPSS对比

BMC Genomics. 2007 Sep 24;8:333. doi: 10.1186/1471-2164-8-333.

A comparative analysis of the information content in long and short SAGE libraries.长链和短链SAGE文库中信息含量的比较分析。

BMC Bioinformatics. 2006 Nov 16;7:504. doi: 10.1186/1471-2105-7-504.

Genomic fossils as a snapshot of the human transcriptome.作为人类转录组快照的基因组化石

Proc Natl Acad Sci U S A. 2006 Jan 31;103(5):1364-9. doi: 10.1073/pnas.0509330103. Epub 2006 Jan 23.

引用本文的文献

Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea.通过对多个RNA测序数据集采用数字基因表达方法来检测地中海涡虫新胚细胞的转录变化。

BMC Genomics. 2015 May 8;16(1):361. doi: 10.1186/s12864-015-1533-1.

5'-Serial Analysis of Gene Expression studies reveal a transcriptomic switch during fruiting body development in Coprinopsis cinerea.5'-基因表达序列分析研究揭示了毛栓菌在子实体发育过程中的转录组转换。

BMC Genomics. 2013 Mar 20;14:195. doi: 10.1186/1471-2164-14-195.

Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells.在癌细胞中表达的转录本群体中单碱基替换的频率增加。

BMC Cancer. 2012 Nov 8;12:509. doi: 10.1186/1471-2407-12-509.

Whole genome wide expression profiles of Vitis amurensis grape responding to downy mildew by using Solexa sequencing technology.利用 Solexa 测序技术研究葡萄霜霉病胁迫下山葡萄全基因组表达谱。

BMC Plant Biol. 2010 Oct 28;10:234. doi: 10.1186/1471-2229-10-234.

Global gene expression reveals a set of new genes involved in the modification of cells during erythroid differentiation.全球基因表达揭示了一组新基因，这些基因参与了红细胞分化过程中细胞的修饰。

Cell Prolif. 2010 Jun;43(3):297-309. doi: 10.1111/j.1365-2184.2010.00679.x.

Expressed sequence tag analysis of adult human optic nerve for NEIBank: identification of cell type and tissue markers.用于神经疾病生物样本库的成人人类视神经表达序列标签分析：细胞类型和组织标志物的鉴定

BMC Neurosci. 2009 Sep 24;10:121. doi: 10.1186/1471-2202-10-121.

Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity.利用 reads 注释基因组：长度、背景分布和序列错误对预测能力的影响。

Nucleic Acids Res. 2009 Aug;37(15):e104. doi: 10.1093/nar/gkp492. Epub 2009 Jun 16.

A score system for quality evaluation of RNA sequence tags: an improvement for gene expression profiling.一种用于RNA序列标签质量评估的评分系统：基因表达谱分析的改进方法

BMC Bioinformatics. 2009 Jun 6;10:170. doi: 10.1186/1471-2105-10-170.

Analysis of wheat SAGE tags reveals evidence for widespread antisense transcription.对小麦SAGE标签的分析揭示了广泛存在反义转录的证据。

BMC Genomics. 2008 Oct 10;9:475. doi: 10.1186/1471-2164-9-475.

A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome.将长链SAGE与Solexa测序相结合非常适合于探究转录组的深度和复杂性。

BMC Genomics. 2008 Sep 16;9:418. doi: 10.1186/1471-2164-9-418.

本文引用的文献

Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.从显微切割组织、流式分选细胞和细胞系大规模生产SAGE文库。

Genome Res. 2007 Jan;17(1):108-16. doi: 10.1101/gr.5488207. Epub 2006 Nov 29.

A large quantity of novel human antisense transcripts detected by LongSAGE.通过长链分析基因表达技术（LongSAGE）检测到大量新的人类反义转录本。

Bioinformatics. 2006 Oct 15;22(20):2475-9. doi: 10.1093/bioinformatics/btl429. Epub 2006 Aug 7.

A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells.小鼠基因表达图谱：来自精确定义的发育中的C57BL/6J小鼠组织和细胞的大规模数字基因表达谱。

Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18485-90. doi: 10.1073/pnas.0509455102. Epub 2005 Dec 13.

Letter from the editor: Adenosine-to-inosine RNA editing in Alu repeats in the human genome.编辑来信：人类基因组中Alu重复序列的腺苷到肌苷RNA编辑

EMBO Rep. 2005 Sep;6(9):831-5. doi: 10.1038/sj.embor.7400507.

ADAR gene family and A-to-I RNA editing: diverse roles in posttranscriptional gene regulation.ADAR基因家族与A到I的RNA编辑：在转录后基因调控中的多种作用

Prog Nucleic Acid Res Mol Biol. 2005;79:299-338. doi: 10.1016/S0079-6603(04)79006-6.

Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution.10条人类染色体的5个核苷酸分辨率转录图谱。

Science. 2005 May 20;308(5725):1149-54. doi: 10.1126/science.1108625. Epub 2005 Mar 24.

Embryonic stem cells: prospects for developmental biology and cell therapy.胚胎干细胞：发育生物学与细胞治疗的前景

Physiol Rev. 2005 Apr;85(2):635-78. doi: 10.1152/physrev.00054.2003.

Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments.基因组中的暗物质：通过微阵列平铺实验检测到的广泛转录的证据。

Trends Genet. 2005 Feb;21(2):93-102. doi: 10.1016/j.tig.2004.12.009.

NCBI GEO: mining millions of expression profiles--database and tools.NCBI基因表达综合数据库：挖掘数百万个表达谱——数据库与工具

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D562-6. doi: 10.1093/nar/gki022.

Ensembl 2005.Ensembl 2005。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D447-53. doi: 10.1093/nar/gki138.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将长链SAGE标签定位到人类基因组后出现的意外观察结果。

Unexpected observations after mapping LongSAGE tags to the human genome.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献