Chen Wei-Hsin, Hsieh Sheau-Ling, Hsu Kai-Ping, Chen Han-Ping, Su Xing-Yu, Tseng Yi-Ju, Chien Yin-Hsiu, Hwu Wuh-Liang, Lai Feipei
National Taiwan University, Graduate Institute of Biomedical Electronics and Bioinformatics, Taipei, Taiwan.
J Med Internet Res. 2013 May 23;15(5):e98. doi: 10.2196/jmir.2495.
A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification.
The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism.
The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases.
The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency.
This SOA Web service-based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically.
医院和家长通常需要一个能够整合筛查数据及数据解读功能的医院信息系统(HIS)。然而,由于疾病特征和用于分类的分析物,疾病分类的准确性可能较低。
本研究的目的是描述一种增强台湾大学医院新生儿筛查中心新生儿筛查系统的系统。该系统是在Web服务.NET环境下根据面向服务的架构(SOA)框架设计和部署的。该系统包括合作医院之间的样本采集、检测、诊断、评估、治疗和随访服务。为提高新生儿筛查的准确性,研究了机器学习和最优特征选择机制,用于筛查新生儿先天性代谢缺陷。
新生儿筛查医院信息系统(NSHIS)的框架使用嵌入式卫生信息交换标准(HL7),以C#语言在由Web服务集成的异构平台之间进行数据交换。在本研究中,使用机器学习分类来预测苯丙酮尿症(PKU)、高甲硫氨酸血症和3-甲基巴豆酰辅酶A羧化酶(3-MCC)缺乏症。分类方法使用了2006年至2011年期间在该中心采集的347312份新生儿干血样本。其中,220名新生儿的值超过诊断临界值(阳性病例),1557名新生儿的值超过筛查临界值但未达到诊断临界值(疑似病例)。根据F分数对最初的35种分析物和表现出的特征进行排名,然后选择排名前20的特征组合作为支持向量机(SVM)分类器的输入特征,以获得最优特征集。使用5折交叉验证对这些特征集进行测试,并生成最优模型。将2011年收集的数据集用作预测病例。
实施了特征选择策略,获得了PKU、高甲硫氨酸血症和3-MCC缺乏症的最优标志物。将机器学习方法的结果与临界值方案进行了比较。PKU的假阳性病例数从21例减少到2例,高甲硫氨酸血症从30例减少到10例,3-MCC缺乏症从209例减少到46例。
这种基于SOA Web服务的新生儿筛查系统可以有效且高效地加速筛查程序。提出了一种用于PKU、高甲硫氨酸血症和3-MCC缺乏症代谢疾病分类的SVM学习方法,包括最优特征选择策略。通过采用本研究的结果,可以显著减少疑似病例的数量。