Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan.
Department of Feto-Maternal Medical Science, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan.
Sci Rep. 2024 Mar 15;14(1):6292. doi: 10.1038/s41598-024-55914-9.
Recently, many phenotyping algorithms for high-throughput cohort identification have been developed. Prospective genome cohort studies are critical resources for precision medicine, but there are many hurdles in the precise cohort identification. Consequently, it is important to develop phenotyping algorithms for cohort data collection. Hypertensive disorders of pregnancy (HDP) is a leading cause of maternal morbidity and mortality. In this study, we developed, applied, and validated rule-based phenotyping algorithms of HDP. Two phenotyping algorithms, algorithms 1 and 2, were developed according to American and Japanese guidelines, and applied into 22,452 pregnant women in the Birth and Three-Generation Cohort Study of the Tohoku Medical Megabank project. To precise cohort identification, we analyzed both structured data (e.g., laboratory and physiological tests) and unstructured clinical notes. The identified subtypes of HDP were validated against reference standards. Algorithms 1 and 2 identified 7.93% and 8.08% of the subjects as having HDP, respectively, along with their HDP subtypes. Our algorithms were high performing with high positive predictive values (0.96 and 0.90 for algorithms 1 and 2, respectively). Overcoming the hurdle of precise cohort identification from large-scale cohort data collection, we achieved both developed and implemented phenotyping algorithms, and precisely identified HDP patients and their subtypes from large-scale cohort data collection.
最近,已经开发出许多用于高通量队列识别的表型分析算法。前瞻性基因组队列研究是精准医学的关键资源,但在精确的队列识别中存在许多障碍。因此,开发用于队列数据收集的表型分析算法非常重要。妊娠高血压疾病(HDP)是孕产妇发病率和死亡率的主要原因。在这项研究中,我们开发、应用和验证了 HDP 的基于规则的表型分析算法。根据美国和日本的指南,开发了两种表型分析算法,算法 1 和算法 2,并将其应用于东北医科大学百万基因组队列研究的 22452 名孕妇中。为了进行精确的队列识别,我们分析了结构化数据(例如实验室和生理测试)和非结构化的临床记录。识别出的 HDP 亚型与参考标准进行了验证。算法 1 和算法 2 分别识别出 7.93%和 8.08%的受试者患有 HDP 及其 HDP 亚型。我们的算法具有较高的阳性预测值(算法 1 和算法 2 分别为 0.96 和 0.90),性能良好。通过克服从大规模队列数据集中进行精确队列识别的障碍,我们成功地开发和实施了表型分析算法,并从大规模队列数据集中精确地识别出 HDP 患者及其亚型。