Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong.
Institute of Digestive Disease, Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, Chinese University of Hong Kong, Shenzhen Research Institute, Sha Tin, New Territories, Hong Kong; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
Gastroenterology. 2022 Sep;163(3):699-711. doi: 10.1053/j.gastro.2022.05.048. Epub 2022 Jun 6.
BACKGROUND & AIMS: Lack of viral reference genomes poses a challenge to virome study. We investigated human gut virome and its clinical implication by ultra-deep metagenomic sequencing.
We extracted sufficient viral DNA from human feces for ultra-deep PacBio sequencing (>10 μg) and Illumina sequencing (>1 μg). Upon de novo assembly and 6 stages of strict filtering, viral genomes were generated and validated in 3 cohorts of 2819 published fecal metagenomes. Diagnostic performance of assembled viruses for colorectal cancer were tested in a training cohort and 2 independent validation cohorts. Virus mapping ratio, evolutionary history, and virus status (lytic or temperate) were also examined.
The mean amount of extracted viral DNA increased by 14-fold compared with previous protocols. We obtained PacBio long reads and Illumina short reads with 290-fold higher depth than previous studies. We assembled and validated 1178 contigs as complete viral genomes, of which 1058 were newly identified. Thirteen viral genomes (398-839 kb) that are longer than the largest bacteriophage found in humans (393 kb) were discovered. Phylogenetic tree was constructed based on Hidden Markov Models alignment scores of 4 conserved viral proteins. Incorporating our assembled genomes into the National Center for Biotechnology Information database improved the mapping ratio of published metagenomes ≤18 times. Lytic viruses (75.9% ± 12.2% of total) were predominantly present in our sample. A biomarker panel of 14 novel viruses could discriminate patients with colorectal cancer from controls with an area under the receiver operating characteristics curve of 0.87 in the training cohort, which was validated with areas under the receiver operating characteristics curve of 0.85 and 0.73 in 2 independent cohorts.
We uncovered 1058 novel human gut viruses. These findings can contribute to clinical diagnosis, current viral reference genome, and future virome investigation.
缺乏病毒参考基因组给病毒组学研究带来了挑战。我们通过超深度宏基因组测序来研究人类肠道病毒组及其临床意义。
我们从人类粪便中提取了足够的病毒 DNA 进行超深度 PacBio 测序(>10 μg)和 Illumina 测序(>1 μg)。通过从头组装和 6 个严格过滤阶段,在 3 个包含 2819 个已发表粪便宏基因组的队列中生成和验证了病毒基因组。在一个训练队列和 2 个独立验证队列中测试了组装病毒对结直肠癌的诊断性能。还检查了病毒映射率、进化史和病毒状态(裂解或温和)。
与之前的方案相比,提取的病毒 DNA 平均量增加了 14 倍。我们获得了 PacBio 长读和 Illumina 短读,深度比以前的研究高 290 倍。我们组装并验证了 1178 个作为完整病毒基因组的连续序列,其中 1058 个是新发现的。发现了 13 个病毒基因组(398-839 kb),比在人类中发现的最大噬菌体(393 kb)还要长。根据 4 种保守病毒蛋白的隐马尔可夫模型对齐分数构建了系统发育树。将我们组装的基因组纳入美国国家生物技术信息中心数据库,将已发表宏基因组的映射率提高了 18 倍以上。裂解病毒(总病毒的 75.9%±12.2%)在我们的样本中占主要地位。在训练队列中,14 种新型病毒的标志物组合可将结直肠癌患者与对照者区分开来,其受试者工作特征曲线下面积为 0.87,在 2 个独立队列中得到验证的面积分别为 0.85 和 0.73。
我们发现了 1058 种新型人类肠道病毒。这些发现有助于临床诊断、当前的病毒参考基因组和未来的病毒组学研究。