Department of Radiology, Laboratory of Medical Imaging and Computation, Massachusetts General Brigham and Harvard Medical School, Boston, MA, USA; Department of Laboratory Medicine, Hanyang University College of Medicine, Seoul, South Korea; GC Genome, GC Laboratories, Yong-in, South Korea.
Department of Radiology, Laboratory of Medical Imaging and Computation, Massachusetts General Brigham and Harvard Medical School, Boston, MA, USA.
Comput Biol Med. 2022 May;144:105332. doi: 10.1016/j.compbiomed.2022.105332. Epub 2022 Feb 24.
Although copy number variations (CNVs) are infrequent, each anomaly is unique, and multiple CNVs can appear simultaneously. Growing evidence suggests that CNVs contribute to a wide range of diseases. When CNVs are detected, assessment of their clinical significance requires a thorough literature review. This process can be extremely time-consuming and may delay disease diagnosis. Therefore, we have developed CNV Extraction, Transformation, and Loading Artificial Intelligence (CNV-ETLAI), an innovative tool that allows experts to classify and interpret CNVs accurately and efficiently.
We combined text, table, and image processing algorithms to develop an artificial intelligence platform that automatically extracts, transforms, and organizes CNV information into a database. To validate CNV-ETLAI, we compared its performance to ground truth datasets labeled by a human expert. In addition, we analyzed the CNV data, which was collected using CNV-ETLAI via a crowdsourcing approach.
In comparison to a human expert, CNV-ETLAI improved CNV detection accuracy by 4% and performed the analysis 60 times faster. This performance can improve even further with upscaling of the CNV-ETLAI database as usage increases. 5,800 CNVs from 2,313 journal articles were collected. Total CNV frequency for the whole chromosome was highest for chromosome X, whereas CNV frequency per 1 Mb of genomic length was highest for chromosome 22.
We have developed, tested, and shared CNV-ETLAI for research and clinical purposes (https://lmic.mgh.harvard.edu/CNV-ETLAI). Use of CNV-ETLAI is expected to ease and accelerate diagnostic classification and interpretation of CNVs.
尽管拷贝数变异(CNVs)很少见,但每个异常都是独特的,并且可以同时出现多个 CNVs。越来越多的证据表明,CNVs 导致了广泛的疾病。当检测到 CNVs 时,需要对其临床意义进行全面的文献回顾。这个过程非常耗时,可能会延迟疾病诊断。因此,我们开发了 CNV 提取、转换和加载人工智能(CNV-ETLAI),这是一种创新的工具,可以让专家准确高效地对 CNVs 进行分类和解释。
我们结合文本、表格和图像处理算法,开发了一个人工智能平台,该平台可以自动提取、转换和组织 CNV 信息到数据库中。为了验证 CNV-ETLAI,我们将其性能与由人类专家标记的地面真实数据集进行了比较。此外,我们还分析了通过众包方式使用 CNV-ETLAI 收集的 CNV 数据。
与人类专家相比,CNV-ETLAI 提高了 4%的 CNV 检测准确性,分析速度提高了 60 倍。随着 CNV-ETLAI 数据库的扩展和使用量的增加,性能还可以进一步提高。从 2313 篇期刊文章中收集了 5800 个 CNVs。整个染色体的总 CNV 频率最高的是 X 染色体,而每 1Mb 基因组长度的 CNV 频率最高的是 22 号染色体。
我们已经开发、测试并共享了用于研究和临床目的的 CNV-ETLAI(https://lmic.mgh.harvard.edu/CNV-ETLAI)。预计使用 CNV-ETLAI 将简化和加速 CNV 的诊断分类和解释。