City Brain Lab, DAMO Academy, Alibaba Group, Hangzhou, China.
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
Sci Rep. 2022 May 30;12(1):8725. doi: 10.1038/s41598-022-12346-7.
Genome variant calling is a challenging yet critical task for subsequent studies. Existing methods almost rely on high depth DNA sequencing data. Performance on low depth data drops a lot. Using public Oxford Nanopore (ONT) data of human being from the Genome in a Bottle (GIAB) Consortium, we trained a generative adversarial network for low depth variant calling. Our method, noted as LDV-Caller, can project high depth sequencing information from low depth data. It achieves 94.25% F1 score on low depth data, while the F1 score of the state-of-the-art method on two times higher depth data is 94.49%. By doing so, the price of genome-wide sequencing examination can reduce deeply. In addition, we validated the trained LDV-Caller model on 157 public Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) samples. The mean sequencing depth of these samples is 2982. The LDV-Caller yields 92.77% F1 score using only 22x sequencing depth, which demonstrates our method has potential to analyze different species with only low depth sequencing data.
基因组变异调用是后续研究中具有挑战性但至关重要的任务。现有的方法几乎都依赖于高深度 DNA 测序数据。在低深度数据上的性能会大幅下降。我们使用基因组瓶装物 (GIAB) 联盟的公开牛津纳米孔 (ONT) 人类数据,训练了一个用于低深度变异调用的生成对抗网络。我们的方法,称为 LDV-Caller,可以从低深度数据中预测高深度测序信息。它在低深度数据上实现了 94.25%的 F1 分数,而最先进方法在两倍更高深度数据上的 F1 分数为 94.49%。通过这样做,可以大大降低全基因组测序检测的价格。此外,我们在 157 个公共严重急性呼吸综合征冠状病毒 2 (SARS-CoV-2) 样本上验证了训练好的 LDV-Caller 模型。这些样本的平均测序深度为 2982。仅使用 22x 测序深度,LDV-Caller 产生了 92.77%的 F1 分数,这表明我们的方法有可能仅使用低深度测序数据分析不同物种。