Suppr超能文献

人类 MUC5AC 基因中大重复中央外显子的基因组参考和序列变异。

Genome reference and sequence variation in the large repetitive central exon of human MUC5AC.

机构信息

1 Cystic Fibrosis/Pulmonary Research and Treatment Center, and.

出版信息

Am J Respir Cell Mol Biol. 2014 Jan;50(1):223-32. doi: 10.1165/rcmb.2013-0235OC.

Abstract

Despite modern sequencing efforts, the difficulty in assembly of highly repetitive sequences has prevented resolution of human genome gaps, including some in the coding regions of genes with important biological functions. One such gene, MUC5AC, encodes a large, secreted mucin, which is one of the two major secreted mucins in human airways. The MUC5AC region contains a gap in the human genome reference (hg19) across the large, highly repetitive, and complex central exon. This exon is predicted to contain imperfect tandem repeat sequences and multiple conserved cysteine-rich (CysD) domains. To resolve the MUC5AC genomic gap, we used high-fidelity long PCR followed by single molecule real-time (SMRT) sequencing. This technology yielded long sequence reads and robust coverage that allowed for de novo sequence assembly spanning the entire repetitive region. Furthermore, we used SMRT sequencing of PCR amplicons covering the central exon to identify genetic variation in four individuals. The results demonstrated the presence of segmental duplications of CysD domains, insertions/deletions (indels) of tandem repeats, and single nucleotide variants. Additional studies demonstrated that one of the identified tandem repeat insertions is tagged by nonexonic single nucleotide polymorphisms. Taken together, these data illustrate the successful utility of SMRT sequencing long reads for de novo assembly of large repetitive sequences to fill the gaps in the human genome. Characterization of the MUC5AC gene and the sequence variation in the central exon will facilitate genetic and functional studies for this critical airway mucin.

摘要

尽管现代测序技术取得了进展,但高度重复序列的组装难题仍未解决,这导致人类基因组的缺口无法确定,其中包括一些具有重要生物学功能的基因的编码区缺口。MUC5AC 基因就是这样一个例子,它编码一种大型分泌性粘蛋白,是人类气道中两种主要分泌性粘蛋白之一。MUC5AC 基因的区域在人类基因组参考序列(hg19)中存在一个缺口,横跨大型、高度重复且复杂的中央外显子。该外显子预计包含不完整的串联重复序列和多个保守的富含半胱氨酸(CysD)结构域。为了解决 MUC5AC 基因的基因组缺口问题,我们使用高保真度长 PCR 技术,然后进行单分子实时(SMRT)测序。这项技术产生了长序列读数和强大的覆盖度,允许从头组装跨越整个重复区域的序列。此外,我们使用覆盖中央外显子的 PCR 扩增子的 SMRT 测序来鉴定四个个体的遗传变异。结果表明,CysD 结构域的串联重复序列发生了片段重复,串联重复序列发生了插入/缺失(indels),以及单核苷酸变异。进一步的研究表明,鉴定出的一个串联重复插入序列被非外显子单核苷酸多态性标记。综上所述,这些数据说明了 SMRT 测序长读长成功地用于从头组装大型重复序列,以填补人类基因组中的缺口。MUC5AC 基因的特征及其中央外显子中的序列变异将促进对这一关键气道粘蛋白的遗传和功能研究。

相似文献

引用本文的文献

4
Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B.分泌型黏蛋白 MUC5AC 和 MUC5B 的结构和遗传多样性。
Am J Hum Genet. 2024 Aug 8;111(8):1700-1716. doi: 10.1016/j.ajhg.2024.06.007. Epub 2024 Jul 10.

本文引用的文献

7

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验