Suppr超能文献

对1%人类基因组的深度哺乳动物序列比对和约束预测分析。

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.

作者信息

Margulies Elliott H, Cooper Gregory M, Asimenos George, Thomas Daryl J, Dewey Colin N, Siepel Adam, Birney Ewan, Keefe Damian, Schwartz Ariel S, Hou Minmei, Taylor James, Nikolaev Sergey, Montoya-Burgos Juan I, Löytynoja Ari, Whelan Simon, Pardi Fabio, Massingham Tim, Brown James B, Bickel Peter, Holmes Ian, Mullikin James C, Ureta-Vidal Abel, Paten Benedict, Stone Eric A, Rosenbloom Kate R, Kent W James, Bouffard Gerard G, Guan Xiaobin, Hansen Nancy F, Idol Jacquelyn R, Maduro Valerie V B, Maskeri Baishali, McDowell Jennifer C, Park Morgan, Thomas Pamela J, Young Alice C, Blakesley Robert W, Muzny Donna M, Sodergren Erica, Wheeler David A, Worley Kim C, Jiang Huaiyang, Weinstock George M, Gibbs Richard A, Graves Tina, Fulton Robert, Mardis Elaine R, Wilson Richard K, Clamp Michele, Cuff James, Gnerre Sante, Jaffe David B, Chang Jean L, Lindblad-Toh Kerstin, Lander Eric S, Hinrichs Angie, Trumbower Heather, Clawson Hiram, Zweig Ann, Kuhn Robert M, Barber Galt, Harte Rachel, Karolchik Donna, Field Matthew A, Moore Richard A, Matthewson Carrie A, Schein Jacqueline E, Marra Marco A, Antonarakis Stylianos E, Batzoglou Serafim, Goldman Nick, Hardison Ross, Haussler David, Miller Webb, Pachter Lior, Green Eric D, Sidow Arend

机构信息

Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.

出版信息

Genome Res. 2007 Jun;17(6):760-74. doi: 10.1101/gr.6034307.

Abstract

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.

摘要

正在进行的ENCODE项目的一个关键组成部分,涉及对人类基因组最初选定的1%进行严格的比较序列分析。在此,我们展示了针对所有ENCODE靶点的23种哺乳动物物种的直系同源序列生成、比对及进化约束分析。使用四种不同方法生成了比对结果;对这些方法的比较揭示了大规模的一致性,但在小基因组重排、灵敏度(序列覆盖度)和特异性(比对准确性)方面存在显著差异。我们描述了与比对方法选择相关联的定量和定性权衡,以及在需要多序列比对的应用中需要考虑的技术误差水平。利用生成的比对结果,我们使用三种不同方法识别了约束区域。虽然不同的约束检测方法总体上是一致的,但在基础比对和特定算法方面存在重要差异。然而,通过整合不同比对结果和约束检测方法的结果,我们生成了基于多种独立衡量标准都很可靠的约束注释。对这些注释的分析表明,大多数经实验注释的功能元件类别都富含受约束序列;然而,每个类别中的很大一部分(蛋白质编码序列除外)并不与约束区域重叠。后一类元件可能不受一级序列约束,可能并非在所有哺乳动物中都受约束,或者可能具有可消耗的分子功能。相反,40%的受约束序列并不与任何已通过实验鉴定的功能元件重叠。总之,这些发现证明并量化了还有多少基因组功能元件有待进行基础分子特征描述。

相似文献

引用本文的文献

2
Identification of constrained sequence elements across 239 primate genomes.在239个灵长类基因组中鉴定受限序列元件
Nature. 2024 Jan;625(7996):735-742. doi: 10.1038/s41586-023-06798-8. Epub 2023 Nov 29.
6
Rate variation in the evolution of non-coding DNA associated with social evolution in bees.非编码 DNA 与蜜蜂社会进化相关的进化中的速率变化。
Philos Trans R Soc Lond B Biol Sci. 2019 Jul 22;374(1777):20180247. doi: 10.1098/rstb.2018.0247. Epub 2019 Jun 3.
8
Nonhuman primate models of human viral infections.人类病毒感染的非人类灵长类动物模型。
Nat Rev Immunol. 2018 Jun;18(6):390-404. doi: 10.1038/s41577-018-0005-7.

本文引用的文献

5
Genetics. SNPs, silent but not invisible.遗传学。单核苷酸多态性,沉默却并非不可见。
Science. 2007 Jan 26;315(5811):466-7. doi: 10.1126/science.1138239. Epub 2006 Dec 21.
6
The ENCODE Project at UC Santa Cruz.加州大学圣克鲁兹分校的DNA元件百科全书计划。
Nucleic Acids Res. 2007 Jan;35(Database issue):D663-7. doi: 10.1093/nar/gkl1017. Epub 2006 Dec 13.
7
In vivo enhancer analysis of human conserved non-coding sequences.人类保守非编码序列的体内增强子分析
Nature. 2006 Nov 23;444(7118):499-502. doi: 10.1038/nature05295. Epub 2006 Nov 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验