Berry Genomics Corporation, Beijing, China.
Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu.
Mol Genet Genomic Med. 2020 Nov;8(11):e1488. doi: 10.1002/mgg3.1488. Epub 2020 Sep 22.
Current copy number variation (CNV) identification methods have rapidly become mature. However, the postdetection processes such as variant interpretation or reporting are inefficient. To overcome this situation, we developed REDBot as an automated software package for accurate and direct generation of clinical diagnostic reports for prenatal and products of conception (POC) samples.
We applied natural language process (NLP) methods for analyzing 30,235 in-house historical clinical reports through active learning, and then, developed clinical knowledge bases, evidence-based interpretation methods and reporting criteria to support the whole postdetection pipeline.
Of the 30,235 reports, we obtained 37,175 CNV-paragraph pairs. For these pairs, the active learning approaches achieved a 0.9466 average F1-score in sentence classification. The overall accuracy for variant classification was 95.7%, 95.2%, and 100.0% in retrospective, prospective, and clinical utility experiments, respectively.
By integrating NLP methods in CNVs postdetection pipeline, REDBot is a robust and rapid tool with clinical utility for prenatal and POC diagnosis.
目前的拷贝数变异(CNV)识别方法已经迅速成熟。然而,变异检测后的处理,如变异解释或报告,效率不高。为了克服这种情况,我们开发了 REDBot,这是一个自动化软件包,用于准确和直接生成产前和妊娠产物(POC)样本的临床诊断报告。
我们通过主动学习应用自然语言处理(NLP)方法分析了 30235 份内部历史临床报告,然后开发了临床知识库、基于证据的解释方法和报告标准,以支持整个检测后管道。
在 30235 份报告中,我们获得了 37175 对 CNV-段落。对于这些对,主动学习方法在句子分类中达到了 0.9466 的平均 F1 分数。在回顾性、前瞻性和临床实用性实验中,变异分类的总体准确率分别为 95.7%、95.2%和 100.0%。
通过在 CNV 检测后管道中集成 NLP 方法,REDBot 是一种具有临床实用性的强大而快速的产前和 POC 诊断工具。