• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重复序列可能占据人类基因组的三分之二以上。

Repetitive elements may comprise over two-thirds of the human genome.

机构信息

Department of Biochemistry and Molecular Genetics, School of Medicine, University of Colorado, Aurora, Colorado, USA.

出版信息

PLoS Genet. 2011 Dec;7(12):e1002384. doi: 10.1371/journal.pgen.1002384. Epub 2011 Dec 1.

DOI:10.1371/journal.pgen.1002384
PMID:22144907
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3228813/
Abstract

Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo "clouds"). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%-69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed "element-specific" P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.

摘要

转座元件 (TEs) 通常通过与保守元件序列比对来在真核基因组中鉴定。使用这种方法,大约一半的人类基因组已经被鉴定为 TEs 和低复杂度重复序列。我们最近开发了一种高度敏感的新从头策略 P-clouds,它不是搜索序列空间中相关的高丰度寡核苷酸簇(寡核苷酸“云”)。我们在这里表明,P-clouds 预测了人类基因组中 >840 Mbp 的额外重复序列,因此表明人类基因组的 66%-69%是重复的或源自重复序列。为了研究这种显著差异,我们对 P-clouds 和一种常用的传统方法 RepeatMasker (RM) 检测高度丰富的人类 Alu 和 MIR SINE 不同大小片段的能力进行了详细分析。与 P-clouds 不同,RM 对即使是中等长度的片段也具有惊人的低灵敏度,P-clouds 对小片段大小(约 25 bp)具有良好的灵敏度。虽然短片段具有很高的假阳性内在概率,但我们进行了概率注释,反映了这一事实。我们进一步开发了“元素特异性”P-clouds(ESPs)来识别新的 Alu 和 MIR SINE 元件,并使用它我们鉴定了大约 100 Mb 以前未注释的人类元件。ESP 对新 MIR 序列的估计与 RM 基于 RM 错过的量的预测非常吻合。这些结果强调了需要联合、概率基因组注释方法,并表明人类基因组包含的重复序列比以前认为的要多得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/4f4853f4c8a1/pgen.1002384.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/626127287ae5/pgen.1002384.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/5ac7943b7b1b/pgen.1002384.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/0d31bfe7cb20/pgen.1002384.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/12ccfd649c7b/pgen.1002384.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/14c034f2de6b/pgen.1002384.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/4f4853f4c8a1/pgen.1002384.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/626127287ae5/pgen.1002384.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/5ac7943b7b1b/pgen.1002384.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/0d31bfe7cb20/pgen.1002384.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/12ccfd649c7b/pgen.1002384.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/14c034f2de6b/pgen.1002384.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6383/3228813/4f4853f4c8a1/pgen.1002384.g006.jpg

相似文献

1
Repetitive elements may comprise over two-thirds of the human genome.重复序列可能占据人类基因组的三分之二以上。
PLoS Genet. 2011 Dec;7(12):e1002384. doi: 10.1371/journal.pgen.1002384. Epub 2011 Dec 1.
2
Identification of repeat structure in large genomes using repeat probability clouds.利用重复概率云识别大型基因组中的重复结构。
Anal Biochem. 2008 Sep 1;380(1):77-83. doi: 10.1016/j.ab.2008.05.015. Epub 2008 May 20.
3
Transposable elements in disease-associated cryptic exons.疾病相关隐匿外显子中的转座元件。
Hum Genet. 2010 Feb;127(2):135-54. doi: 10.1007/s00439-009-0752-4. Epub 2009 Oct 10.
4
A novel group of families of short interspersed repetitive elements (SINEs) in Xenopus: evidence of a specific target site for DNA-mediated transposition of inverted-repeat SINEs.非洲爪蟾中一组新型的短散布重复元件(SINEs)家族:反向重复SINEs的DNA介导转座特定靶位点的证据
J Mol Biol. 1995 May 12;248(4):812-23. doi: 10.1006/jmbi.1995.0262.
5
Transposable element annotation of the rice genome.水稻基因组的转座元件注释
Bioinformatics. 2004 Jan 22;20(2):155-60. doi: 10.1093/bioinformatics/bth019.
6
Characterization and functional annotation of nested transposable elements in eukaryotic genomes.真核生物基因组中嵌套转座元件的特征描述和功能注释。
Genomics. 2012 Oct;100(4):222-30. doi: 10.1016/j.ygeno.2012.07.004. Epub 2012 Jul 16.
7
[Computational approaches for identification and classification of transposable elements in eukaryotic genomes].[真核生物基因组中转座元件鉴定与分类的计算方法]
Yi Chuan. 2012 Aug;34(8):1009-19. doi: 10.3724/sp.j.1005.2012.01009.
8
Functional microRNAs and target sites are created by lineage-specific transposition.功能性微小RNA和靶位点由谱系特异性转座产生。
Hum Mol Genet. 2014 Apr 1;23(7):1783-93. doi: 10.1093/hmg/ddt569. Epub 2013 Nov 13.
9
Improved repeat identification and masking in Dipterans.双翅目昆虫中重复序列识别与屏蔽的改进
Gene. 2007 Mar 1;389(1):1-9. doi: 10.1016/j.gene.2006.09.011. Epub 2006 Oct 12.
10
Widespread occurrence of power-law distributions in inter-repeat distances shaped by genome dynamics.基因组动力学塑造的重复间距离中幂律分布的广泛出现。
Gene. 2012 May 10;499(1):88-98. doi: 10.1016/j.gene.2012.02.005. Epub 2012 Feb 18.

引用本文的文献

1
Non-CG DNA methylation in animal genomes.动物基因组中的非CG DNA甲基化
Nat Genet. 2025 Sep 11. doi: 10.1038/s41588-025-02303-1.
2
When cells think: a neuro-symbolic view of epigenetic regulation.当细胞思考时:表观遗传调控的神经符号学观点
Environ Epigenet. 2025 Jul 1;11(1):dvaf022. doi: 10.1093/eep/dvaf022. eCollection 2025.
3
Inflammatory mitochondrial signalling and viral mimicry in cancer.癌症中的炎症性线粒体信号传导与病毒模拟

本文引用的文献

1
Reading TE leaves: new approaches to the identification of transposable element insertions.阅读 TE 叶:鉴定转座子插入的新方法。
Genome Res. 2011 Jun;21(6):813-20. doi: 10.1101/gr.110528.110.
2
Discovery of highly divergent repeat landscapes in snake genomes using high-throughput sequencing.利用高通量测序发现蛇基因组中高度分化的重复景观。
Genome Biol Evol. 2011;3:641-53. doi: 10.1093/gbe/evr043. Epub 2011 May 13.
3
The genome of a songbird.一种鸣禽的基因组。
J Transl Med. 2025 Sep 2;23(1):982. doi: 10.1186/s12967-025-06931-3.
4
Combination of Long-Read Sequencing and Hi-C Technology to Identify Chromoanagenesis Events in Cancer.长读长测序与Hi-C技术相结合以识别癌症中的染色体畸变事件
Methods Mol Biol. 2025;2968:161-172. doi: 10.1007/978-1-0716-4750-9_9.
5
Role of Ionizing Radiation in Shaping the Complex Multi-Layered Epigenome.电离辐射在塑造复杂多层表观基因组中的作用。
Epigenomes. 2025 Aug 8;9(3):29. doi: 10.3390/epigenomes9030029.
6
Cancer cells subvert the primate-specific KRAB zinc finger protein ZNF93 to control APOBEC3B.癌细胞颠覆灵长类动物特有的KRAB锌指蛋白ZNF93以控制载脂蛋白B编辑酶催化多肽样蛋白3B。
Proc Natl Acad Sci U S A. 2025 Aug 26;122(34):e2505021122. doi: 10.1073/pnas.2505021122. Epub 2025 Aug 19.
7
Long interspersed nuclear element 1 methylation in non-small cell lung cancer: implications for diagnosis, prognosis, and therapeutic targeting.非小细胞肺癌中长散在核元件1甲基化:对诊断、预后及治疗靶点的意义
Cell Commun Signal. 2025 Jul 22;23(1):350. doi: 10.1186/s12964-025-02343-4.
8
Single-cell long-read Hi-C, scNanoHi-C2, details 3D genome reorganization in embryonic-stage germ cells.单细胞长读长Hi-C技术,即scNanoHi-C2,揭示了胚胎期生殖细胞中的三维基因组重组细节。
Nat Struct Mol Biol. 2025 Jul 4. doi: 10.1038/s41594-025-01604-7.
9
Multi-Cohort Exploration of Repetitive Element Transcription and DNA Methylation in Human Steatotic Liver Disease.人类脂肪性肝病中重复元件转录和DNA甲基化的多队列探索
Int J Mol Sci. 2025 Jun 8;26(12):5494. doi: 10.3390/ijms26125494.
10
Birth of protein-coding exons by ancient domestication of LINE-1 retrotransposon.通过LINE-1逆转录转座子的古代驯化产生蛋白质编码外显子。
Genome Res. 2025 Jun 2;35(6):1287-1300. doi: 10.1101/gr.280007.124.
Nature. 2010 Apr 1;464(7289):757-62. doi: 10.1038/nature08819.
4
Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes.使用 REPCLASS 探索重复 DNA 景观,这是一种自动化分类真核生物基因组中转座元件的工具。
Genome Biol Evol. 2009 Jul 23;1:205-20. doi: 10.1093/gbe/evp023.
5
BEDTools: a flexible suite of utilities for comparing genomic features.BEDTools:一套灵活的基因组特征比较工具套件。
Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28.
6
Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs.在测序基因组中识别重复序列和转座元件:如何在密集的程序森林中找到自己的路。
Heredity (Edinb). 2010 Jun;104(6):520-33. doi: 10.1038/hdy.2009.165. Epub 2009 Nov 25.
7
Identification of repeat structure in large genomes using repeat probability clouds.利用重复概率云识别大型基因组中的重复结构。
Anal Biochem. 2008 Sep 1;380(1):77-83. doi: 10.1016/j.ab.2008.05.015. Epub 2008 May 20.
8
Uncertainty in homology inferences: assessing and improving genomic sequence alignment.同源性推断中的不确定性:评估和改进基因组序列比对
Genome Res. 2008 Feb;18(2):298-309. doi: 10.1101/gr.6725608. Epub 2007 Dec 11.
9
Initial sequence and comparative analysis of the cat genome.猫基因组的初始序列及比较分析。
Genome Res. 2007 Nov;17(11):1675-89. doi: 10.1101/gr.6380007.
10
Raising the estimate of functional human sequences.提高对功能性人类序列的估计。
Genome Res. 2007 Sep;17(9):1245-53. doi: 10.1101/gr.6406307. Epub 2007 Aug 9.