Nakamura Wataru, Hirata Makoto, Oda Satoyo, Chiba Kenichi, Okada Ai, Mateos Raúl Nicolás, Sugawa Masahiro, Iida Naoko, Ushiama Mineko, Tanabe Noriko, Sakamoto Hiromi, Sekine Shigeki, Hirasawa Akira, Kawai Yosuke, Tokunaga Katsushi, Tsujimoto Shin-Ichi, Shiba Norio, Ito Shuichi, Yoshida Teruhiko, Shiraishi Yuichi
Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan.
Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan.
NPJ Genom Med. 2024 Feb 17;9(1):11. doi: 10.1038/s41525-024-00394-z.
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
测序技术的创新已导致发现了导致遗传性疾病的新突变。然而,许多疑似遗传病患者仍未得到诊断。长读长测序技术有望通过克服短读长测序的局限性来显著提高诊断率。此外,牛津纳米孔技术公司(ONT)提供了自适应采样和计算驱动的目标富集技术。与标准的非选择性长读长测序相比,这使得对目标基因区域进行更经济实惠的深度分析成为可能。在本研究中,我们开发了一种用于目标自适应采样长读长测序(TAS-LRS)的高效计算工作流程,并通过应用于从疑似遗传性癌症患者收集的33个基因组对其进行了评估。我们的工作流程能够以与短读长平台几乎相同的准确性识别单核苷酸变异,并阐明复杂形式的结构变异。我们还在两名家族性腺瘤性息肉病患者中新发现了几个影响APC基因的SINE-R/VNTR/Alu(SVA)元件及其起源位点。此外,我们证明了通常会被丢弃的自适应采样的脱靶 reads 可以有效地用于准确对整个基因组中的常见单核苷酸多态性(SNP)进行基因分型,从而能够计算多基因风险评分。此外,我们在一名林奇综合征患者中鉴定出了等位基因特异性的MLH1启动子高甲基化。总之,我们的TAS-LRS工作流程可以同时捕获包括复杂结构变异、多基因背景以及表观遗传改变在内的单基因风险变异,并且将成为遗传病研究和诊断的一个高效平台。