Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Center for Biomedical Informatics, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.
World J Pediatr. 2024 Oct;20(10):1090-1101. doi: 10.1007/s12519-023-00788-6. Epub 2024 Feb 24.
Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.
We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.
Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.
The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.
甲基丙二酸血症(MMA)是一种常染色体隐性遗传疾病,估计患病率为 1:50000。一线临床诊断检测通常会产生许多假阳性结果[5 个假阳性(FP):1 个真阳性(TP)]。在这项工作中,我们的目标是改进分类模型,以最大限度地减少假阳性数量,这是 MMA 上游诊断中尚未满足的需求。
我们开发了用于 MMA 的机器学习多变量筛选模型,作为减少假阳性的二级工具。我们利用基于质谱的特征,这些特征包括来自新生儿患者干血样的 11 种氨基酸和 31 种肉碱,然后构建额外的比值特征。使用特征选择策略(过滤选择、递归特征消除和学习向量量化)来确定输入集,以评估 14 种分类模型的性能,从而确定候选模型集,用于开发集成模型。
我们的工作确定了探索代谢物的计算模型,以在不影响敏感性的情况下减少假阳性数量。使用随机森林、C5.0、稀疏线性判别分析和自动编码器深度神经网络算法的集成,以及作为监督算法的随机梯度提升算法,获得了最佳结果[接收器操作特征曲线(AUROC)为 97%、敏感性为 92%、特异性为 95%]。该模型在具有 6%假阳性率(FPR)时达到 95%敏感性、35%FPR 时达到 99%敏感性、39%FPR 时达到 100%敏感性的筛查应用中实现了良好的性能折衷。
这项研究的分类结果和方法可以被全球的临床医生利用,以提高儿科患者中 MMA 的整体发现率。在调整到 100%精度时,改进的方法可进一步告知 MMA 的诊断过程,并有助于减轻患者及其家属的负担。