Urteaga Iñigo, McKillop Mollie, Elhadad Noémie
Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027 USA.
Data Science Institute, Columbia University, New York, NY 10027 USA.
NPJ Digit Med. 2020 Jun 24;3:88. doi: 10.1038/s41746-020-0292-9. eCollection 2020.
Endometriosis is a systemic and chronic condition in women of childbearing age, yet a highly enigmatic disease with unresolved questions: there are no known biomarkers, nor established clinical stages. We here investigate the use of patient-generated health data and data-driven phenotyping to characterize endometriosis patient subtypes, based on their reported signs and symptoms. We aim at unsupervised learning of endometriosis phenotypes using self-tracking data from personal smartphones. We leverage data from an observational research study of over 4000 women with endometriosis that track their condition over more than 2 years. We extend a classical mixed-membership model to accommodate the idiosyncrasies of the data at hand, i.e., the multimodality and uncertainty of the self-tracked variables. The proposed method, by jointly modeling a wide range of observations (i.e., participant symptoms, quality of life, treatments), identifies clinically relevant endometriosis subtypes. Experiments show that our method is robust to different hyperparameter choices and the biases of self-tracking data (e.g., the wide variations in tracking frequency among participants). With this work, we show the promise of unsupervised learning of endometriosis subtypes from self-tracked data, as learned phenotypes align well with what is already known about the disease, but also suggest new clinically actionable findings. More generally, we argue that a continued research effort on unsupervised phenotyping methods with patient-generated health data via new mobile and digital technologies will have significant impact on the study of enigmatic diseases in particular, and health in general.
子宫内膜异位症是育龄期女性的一种全身性慢性疾病,然而却是一种极具谜团的疾病,存在诸多未解之谜:尚无已知的生物标志物,也没有既定的临床分期。我们在此研究利用患者生成的健康数据和数据驱动的表型分析来根据报告的体征和症状对子宫内膜异位症患者亚型进行特征描述。我们旨在使用个人智能手机的自我追踪数据对子宫内膜异位症表型进行无监督学习。我们利用了一项对4000多名子宫内膜异位症女性进行的观察性研究的数据,该研究对她们的病情进行了两年多的跟踪。我们扩展了一个经典的混合成员模型,以适应手头数据的特性,即自我追踪变量的多模态性和不确定性。所提出的方法通过联合对广泛的观察结果(即参与者症状、生活质量、治疗方法)进行建模,识别出临床上相关的子宫内膜异位症亚型。实验表明,我们的方法对于不同的超参数选择和自我追踪数据的偏差(例如参与者之间追踪频率的广泛差异)具有鲁棒性。通过这项工作,我们展示了从自我追踪数据中对子宫内膜异位症亚型进行无监督学习的前景,因为所学习到的表型与该疾病已知情况吻合良好,同时还提出了新的具有临床可操作性的发现。更广泛地说,我们认为通过新的移动和数字技术对利用患者生成的健康数据进行无监督表型分析方法的持续研究努力,将尤其对神秘疾病的研究以及总体健康产生重大影响。