Zigarelli Angela, Jia Ziyang, Lee Hyunsun
Department of Mathematics and Statistics, University of Massachusetts Amherst, Newton, MA, United States.
JMIR Form Res. 2022 Mar 15;6(3):e29967. doi: 10.2196/29967.
Artificial intelligence and digital health care have substantially advanced to improve and enhance medical diagnosis and treatment during the prolonged period of the COVID-19 global pandemic. In this study, we discuss the development of prediction models for the self-diagnosis of polycystic ovary syndrome (PCOS) using machine learning techniques.
We aim to develop self-diagnostic prediction models for PCOS in potential patients and clinical providers. For potential patients, the prediction is based only on noninvasive measures such as anthropomorphic measures, symptoms, age, and other lifestyle factors so that the proposed prediction tool can be conveniently used without any laboratory or ultrasound test results. For clinical providers who can access patients' medical test results, prediction models using all predictor variables can be adopted to help health providers diagnose patients with PCOS. We compare both prediction models using various error metrics. We call the former model the patient model and the latter, the provider model throughout this paper.
In this retrospective study, a publicly available data set of 541 women's health information collected from 10 different hospitals in Kerala, India, including PCOS status, was acquired and used for analysis. We adopted the CatBoost method for classification, K-fold cross-validation for estimating the performance of models, and SHAP (Shapley Additive Explanations) values to explain the importance of each variable. In our subgroup study, we used k-means clustering and Principal Component Analysis to split the data set into 2 distinct BMI subgroups and compared the prediction results as well as the feature importance between the 2 subgroups.
We achieved 81% to 82.5% prediction accuracy of PCOS status without any invasive measures in the patient models and achieved 87.5% to 90.1% prediction accuracy using both noninvasive and invasive predictor variables in the provider models. Among noninvasive measures, variables including acanthosis nigricans, acne, hirsutism, irregular menstrual cycle, length of menstrual cycle, weight gain, fast food consumption, and age were more important in the models. In medical test results, the numbers of follicles in the right and left ovaries and anti-Müllerian hormone were ranked highly in feature importance. We also reported more detailed results in a subgroup study.
The proposed prediction models are ultimately expected to serve as a convenient digital platform with which users can acquire pre- or self-diagnosis and counsel for the risk of PCOS, with or without obtaining medical test results. It will enable women to conveniently access the platform at home without delay before they seek further medical care. Clinical providers can also use the proposed prediction tool to help diagnose PCOS in women.
在新冠疫情全球大流行的漫长时期,人工智能和数字医疗保健取得了显著进展,以改善和提升医学诊断与治疗水平。在本研究中,我们探讨了使用机器学习技术开发多囊卵巢综合征(PCOS)自我诊断预测模型的情况。
我们旨在为潜在患者和临床医疗人员开发PCOS的自我诊断预测模型。对于潜在患者,预测仅基于诸如人体测量指标、症状、年龄和其他生活方式因素等非侵入性测量方法,以便所提出的预测工具无需任何实验室或超声检查结果即可方便使用。对于能够获取患者医学检查结果的临床医疗人员,可以采用使用所有预测变量的预测模型来帮助医疗人员诊断PCOS患者。我们使用各种误差度量来比较这两种预测模型。在本文中,我们将前者模型称为患者模型,后者称为医疗人员模型。
在这项回顾性研究中,我们获取了从印度喀拉拉邦10家不同医院收集的541名女性健康信息的公开数据集,包括PCOS状态,并将其用于分析。我们采用CatBoost方法进行分类,使用K折交叉验证来评估模型的性能,并使用SHAP(Shapley值加法解释)值来解释每个变量的重要性。在我们的亚组研究中,我们使用k均值聚类和主成分分析将数据集分为2个不同的BMI亚组,并比较了这2个亚组之间的预测结果以及特征重要性。
在患者模型中,我们在不采取任何侵入性测量方法的情况下实现了81%至82.5%的PCOS状态预测准确率,在医疗人员模型中,使用非侵入性和侵入性预测变量实现了87.5%至90.1%的预测准确率。在非侵入性测量方法中,包括黑棘皮症、痤疮、多毛症、月经周期不规律、月经周期长度、体重增加、快餐消费和年龄等变量在模型中更为重要。在医学检查结果中,左右卵巢中的卵泡数量和抗苗勒管激素在特征重要性方面排名靠前。我们还在亚组研究中报告了更详细的结果。
最终期望所提出的预测模型能够成为一个便捷的数字平台,用户无论是否获得医学检查结果,都可以通过该平台获得PCOS风险的预诊断或自我诊断及咨询。这将使女性能够在寻求进一步医疗护理之前,方便地在家中及时访问该平台。临床医疗人员也可以使用所提出的预测工具来帮助诊断女性的PCOS。