Pan Liyan, Liu Guangjian, Mao Xiaojian, Liang Huiying
Institute of Pediatrics, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
Department of Genetics and Endocrinology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
JAMIA Open. 2020 Dec 5;3(4):567-575. doi: 10.1093/jamiaopen/ooaa063. eCollection 2020 Dec.
The study aimed to develop simplified diagnostic models for identifying girls with central precocious puberty (CPP), without the expensive and cumbersome gonadotropin-releasing hormone (GnRH) stimulation test, which is the gold standard for CPP diagnosis.
Female patients who had secondary sexual characteristics before 8 years old and had taken a GnRH analog (GnRHa) stimulation test at a medical center in Guangzhou, China were enrolled. Data from clinical visiting, laboratory tests, and medical image examinations were collected. We first extracted features from unstructured data such as clinical reports and medical images. Then, models based on each single-source data or multisource data were developed with Extreme Gradient Boosting (XGBoost) classifier to classify patients as CPP or non-CPP.
The best performance achieved an area under the curve (AUC) of 0.88 and Youden index of 0.64 in the model based on multisource data. The performance of single-source models based on data from basal laboratory tests and the feature importance of each variable showed that the basal hormone test had the highest diagnostic value for a CPP diagnosis.
We developed three simplified models that use easily accessed clinical data before the GnRH stimulation test to identify girls who are at high risk of CPP. These models are tailored to the needs of patients in different clinical settings. Machine learning technologies and multisource data fusion can help to make a better diagnosis than traditional methods.
本研究旨在开发简化的诊断模型,用于识别中枢性性早熟(CPP)女童,无需使用昂贵且繁琐的促性腺激素释放激素(GnRH)刺激试验,而该试验是CPP诊断的金标准。
纳入8岁前出现第二性征且在中国广州某医疗中心接受GnRH类似物(GnRHa)刺激试验的女性患者。收集临床就诊、实验室检查和医学影像检查的数据。我们首先从临床报告和医学影像等非结构化数据中提取特征。然后,使用极端梯度提升(XGBoost)分类器开发基于单一源数据或多源数据的模型,以将患者分类为CPP或非CPP。
基于多源数据的模型中,最佳性能的曲线下面积(AUC)为0.88,约登指数为0.64。基于基础实验室检查数据的单源模型性能及各变量的特征重要性表明,基础激素检查对CPP诊断具有最高的诊断价值。
我们开发了三种简化模型,利用GnRH刺激试验前易于获取的临床数据来识别有CPP高风险的女童。这些模型是根据不同临床环境中患者的需求定制的。机器学习技术和多源数据融合有助于比传统方法做出更好的诊断。