使用机器学习减少新生儿筛查中的假阳性结果

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

作者信息

Peng Gang, Tang Yishuo, Cowan Tina M, Enns Gregory M, Zhao Hongyu, Scharfe Curt

机构信息

Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA.

Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06520, USA.

出版信息

Int J Neonatal Screen. 2020 Mar;6(1). doi: 10.3390/ijns6010016. Epub 2020 Mar 3.

DOI:10.3390/ijns6010016

PMID:32190768

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7080200/

Abstract

Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest's ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening.

摘要

针对先天性代谢紊乱的新生儿筛查（NBS）是一项非常成功的公共卫生项目，但从设计上来说，它会伴随出现假阳性结果。在此，我们基于筛查数据训练了一个随机森林机器学习分类器，以改进对真阳性和假阳性的预测。数据包括通过串联质谱法检测到的39种代谢分析物以及诸如胎龄和出生体重等临床变量。对加利福尼亚新生儿筛查项目报告的2777例筛查阳性病例队列的分析性能进行了评估，该队列包括235例确诊病例以及四种疾病之一的2542例假阳性病例，这四种疾病分别为：1型戊二酸血症（GA - 1）、甲基丙二酸血症（MMA）、鸟氨酸转氨甲酰酶缺乏症（OTCD）和极长链酰基辅酶A脱氢酶缺乏症（VLCADD）。在不改变筛查中检测这些疾病敏感性的情况下，基于随机森林对所有代谢物的分析将GA - 1的假阳性数量减少了89%，MMA减少了45%，OTCD减少了98%，VLCADD减少了2%。所有主要疾病标志物以及先前报道的分析物，如MMA和OTCD的蛋氨酸，都在排名靠前的分析物之中。发现随机森林对GA - 1假阳性的分类能力与使用临床实验室综合报告（CLIR）获得的结果相似。我们开发了一个在线随机森林工具，用于对来自新生儿筛查的日益复杂的数据进行解释性分析。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用机器学习减少新生儿筛查中的假阳性结果

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

使用机器学习减少新生儿筛查中的假阳性结果

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献