人类10号染色体的DNA序列及比较分析

The DNA sequence and comparative analysis of human chromosome 10.

作者信息

Deloukas P, Earthrowl M E, Grafham D V, Rubenfield M, French L, Steward C A, Sims S K, Jones M C, Searle S, Scott C, Howe K, Hunt S E, Andrews T D, Gilbert J G R, Swarbreck D, Ashurst J L, Taylor A, Battles J, Bird C P, Ainscough R, Almeida J P, Ashwell R I S, Ambrose K D, Babbage A K, Bagguley C L, Bailey J, Banerjee R, Bates K, Beasley H, Bray-Allen S, Brown A J, Brown J Y, Burford D C, Burrill W, Burton J, Cahill P, Camire D, Carter N P, Chapman J C, Clark S Y, Clarke G, Clee C M, Clegg S, Corby N, Coulson A, Dhami P, Dutta I, Dunn M, Faulkner L, Frankish A, Frankland J A, Garner P, Garnett J, Gribble S, Griffiths C, Grocock R, Gustafson E, Hammond S, Harley J L, Hart E, Heath P D, Ho T P, Hopkins B, Horne J, Howden P J, Huckle E, Hynds C, Johnson C, Johnson D, Kana A, Kay M, Kimberley A M, Kershaw J K, Kokkinaki M, Laird G K, Lawlor S, Lee H M, Leongamornlert D A, Laird G, Lloyd C, Lloyd D M, Loveland J, Lovell J, McLaren S, McLay K E, McMurray A, Mashreghi-Mohammadi M, Matthews L, Milne S, Nickerson T, Nguyen M, Overton-Larty E, Palmer S A, Pearce A V, Peck A I, Pelan S, Phillimore B, Porter K, Rice C M, Rogosin A, Ross M T, Sarafidou T, Sehra H K, Shownkeen R, Skuce C D, Smith M, Standring L, Sycamore N, Tester J, Thorpe A, Torcasso W, Tracey A, Tromans A, Tsolas J, Wall M, Walsh J, Wang H, Weinstock K, West A P, Willey D L, Whitehead S L, Wilming L, Wray P W, Young L, Chen Y, Lovering R C, Moschonas N K, Siebert R, Fechtel K, Bentley D, Durbin R, Hubbard T, Doucette-Stamm L, Beck S, Smith D R, Rogers J

机构信息

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

出版信息

Nature. 2004 May 27;429(6990):375-81. doi: 10.1038/nature02462.

DOI:10.1038/nature02462

PMID:15164054

Abstract

The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.

摘要

人类10号染色体的完整序列共有131,666,441个碱基对。它代表了常染色质DNA的99.4%，并在染色体短臂和长臂的着丝粒周围区域包含1兆碱基的异染色质序列。序列注释显示有1357个基因，其中816个是蛋白质编码基因，430个是假基因。我们观察到重叠编码基因（无论正负链）广泛存在，并鉴定出67个反义转录本。我们的分析表明，染色体间和染色体内的片段重复都对10号染色体上的基因数量产生了影响。多物种比较分析表明，利用现有资源我们能够轻松注释蛋白质编码基因。我们估计在本研究中已鉴定出超过95%的编码外显子。对人类10号染色体与黑猩猩序列之间单碱基变化的评估显示，相对于人类序列，只有21个编码基因中存在无义突变。