Suppr
超能文献

系统分析暗基因和伪装基因揭示了隐藏在明处的与疾病相关的基因。

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

机构信息

Department of Neuroscience, Mayo Clinic, Jacksonville, FL, 32224, USA.

Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, 32224, USA.

出版信息

Genome Biol. 2019 May 20;20(1):97. doi: 10.1186/s13059-019-1707-2.

DOI:10.1186/s13059-019-1707-2

PMID:31104630

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6526621/

Abstract

BACKGROUND

The human genome contains "dark" gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.

RESULTS

Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are ≥ 5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer's Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer's disease gene, found in disease cases but not in controls.

CONCLUSIONS

While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer's disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.

摘要

背景

人类基因组包含“暗区”，即使用标准短读测序技术无法充分组装或比对的基因区域，这使得研究人员无法在这些可能与人类疾病相关的基因区域中识别突变。在这里，我们确定了那些可读取reads 数量较少的区域，将其称为暗区，以及那些对齐不明确的区域，称为伪装区。我们评估了长读长或连接读取技术在这些区域的解析能力。

结果

基于标准的全基因组 Illumina 测序数据，我们在 6054 个与人类健康、发育和生殖相关的重要途径的基因体中，鉴定出 36794 个暗区。在这些基因体中，8.7%完全为暗区，35.2%的暗区比例大于等于 5%。我们在 748 个基因的蛋白质编码外显子中发现了暗区。10x Genomics、PacBio 和 Oxford Nanopore Technologies 的连接读取或长读测序技术分别将暗的蛋白质编码区域减少到约 50.5%、35.6%和 9.6%。我们提出了一种算法来解决大多数伪装区的问题，并将其应用于阿尔茨海默病测序计划。我们在阿尔茨海默病基因 CR1 中发现了一个罕见的十核苷酸移码缺失，该缺失在疾病病例中存在，但在对照组中不存在。

结论

虽然由于样本量不足，我们无法正式评估 CR1 移码突变与阿尔茨海默病的关联，但我们认为值得在更大的队列中进行研究。仍有数千个可能重要的基因组区域被短读测序忽略，而这些区域在很大程度上可以通过长读技术解决。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbfa/6526621/8f9dd58b0ae4/13059_2019_1707_Fig1_HTML.jpg

相似文献

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

Genome Biol. 2019 May 20;20(1):97. doi: 10.1186/s13059-019-1707-2.

Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion: implications for clinical use and genetic discovery efforts in human disease.

Mol Neurodegener. 2018 Aug 21;13(1):46. doi: 10.1186/s13024-018-0274-4.

Investigating the Performance of Oxford Nanopore Long-Read Sequencing with Respect to Illumina Microarrays and Short-Read Sequencing.

Int J Mol Sci. 2025 May 8;26(10):4492. doi: 10.3390/ijms26104492.

Assembly of Mitochondrial Genomes Using Nanopore Long-Read Technology in Three Sea Chubs (Teleostei: Kyphosidae).

Mol Ecol Resour. 2025 Jan;25(1):e14034. doi: 10.1111/1755-0998.14034. Epub 2024 Oct 15.

Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.

Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

Investigating the dark-side of the genome: a barrier to human disease variant discovery?

Biol Res. 2023 Jul 20;56(1):42. doi: 10.1186/s40659-023-00455-0.

Illuminating the dark side of the human transcriptome with long read transcript sequencing.

BMC Genomics. 2020 Oct 30;21(1):751. doi: 10.1186/s12864-020-07123-7.

Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods.

Genes (Basel). 2022 May 6;13(5):828. doi: 10.3390/genes13050828.

Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore.

Gigascience. 2020 Dec 15;9(12). doi: 10.1093/gigascience/giaa123.

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes.

Microb Genom. 2019 Sep;5(9). doi: 10.1099/mgen.0.000294. Epub 2019 Aug 30.

引用本文的文献

TCR germline diversity reveals evidence of natural selection on variable and joining alpha chain genes.

bioRxiv. 2025 Aug 24:2025.08.20.671277. doi: 10.1101/2025.08.20.671277.

Long-read sequencing of trios reveals increased germline and postzygotic mutation rates in repetitive DNA.

bioRxiv. 2025 Jul 19:2025.07.18.665621. doi: 10.1101/2025.07.18.665621.

Whole-genome variant detection in long-read sequencing data from ultra-low input patient samples.

medRxiv. 2025 Jul 27:2025.07.25.25332067. doi: 10.1101/2025.07.25.25332067.

Human-specific gene expansions contribute to brain evolution.

Cell. 2025 Jul 18. doi: 10.1016/j.cell.2025.06.037.

Sequencing the gaps: dark genomic regions persist in CHM13 despite long-read advances.

bioRxiv. 2025 May 28:2025.05.23.655776. doi: 10.1101/2025.05.23.655776.

Investigating the Performance of Oxford Nanopore Long-Read Sequencing with Respect to Illumina Microarrays and Short-Read Sequencing.

Int J Mol Sci. 2025 May 8;26(10):4492. doi: 10.3390/ijms26104492.

Evaluation of Illumina and Oxford Nanopore Sequencing for the Study of DNA Methylation in Alzheimer's Disease and Frontotemporal Dementia.

Int J Mol Sci. 2025 Apr 28;26(9):4198. doi: 10.3390/ijms26094198.

Identifying genetic errors of immunity due to mosaicism.

J Exp Med. 2025 May 5;222(5). doi: 10.1084/jem.20241045. Epub 2025 Apr 15.

Genome-wide profiling of highly similar paralogous genes using HiFi sequencing.

Nat Commun. 2025 Mar 8;16(1):2340. doi: 10.1038/s41467-025-57505-2.

The human immunoglobulin heavy chain constant gene locus is enriched for large complex structural variants and coding polymorphisms that vary in frequency among human populations.

bioRxiv. 2025 Feb 12:2025.02.12.634878. doi: 10.1101/2025.02.12.634878.

本文引用的文献

High-coverage, long-read sequencing of Han Chinese trio reference samples.

Sci Data. 2019 Jun 14;6(1):91. doi: 10.1038/s41597-019-0098-2.

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files.

Bioinformatics. 2019 Jul 1;35(13):2193-2198. doi: 10.1093/bioinformatics/bty841.

GENCODE reference annotation for the human and mouse genomes.

Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773. doi: 10.1093/nar/gky955.

Long-read sequencing across the C9orf72 'GGGGCC' repeat expansion: implications for clinical use and genetic discovery efforts in human disease.

Mol Neurodegener. 2018 Aug 21;13(1):46. doi: 10.1186/s13024-018-0274-4.

Inherited and Acquired Decrease in Complement Receptor 1 (CR1) Density on Red Blood Cells Associated with High Levels of Soluble CR1 in Alzheimer's Disease.

Int J Mol Sci. 2018 Jul 25;19(8):2175. doi: 10.3390/ijms19082175.

Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.

Genomics. 2019 Jul;111(4):808-818. doi: 10.1016/j.ygeno.2018.05.004. Epub 2018 May 29.

Minimap2: pairwise alignment for nucleotide sequences.

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Splicing mutations in human genetic disorders: examples, detection, and confirmation.

J Appl Genet. 2018 Aug;59(3):253-268. doi: 10.1007/s13353-018-0444-7. Epub 2018 Apr 21.

Complement receptor 1 gene (CR1) intragenic duplication and risk of Alzheimer's disease.

Hum Genet. 2018 Apr;137(4):305-314. doi: 10.1007/s00439-018-1883-2. Epub 2018 Apr 19.

De novo mutations in regulatory elements in neurodevelopmental disorders.

Nature. 2018 Mar 29;555(7698):611-616. doi: 10.1038/nature25983. Epub 2018 Mar 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

系统分析暗基因和伪装基因揭示了隐藏在明处的与疾病相关的基因。

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译