Department of Epidemiology, Brown University School of Public Health, Women and Infants Hospital of Rhode Island, Department of Pediatrics, Brown Alpert Medical School, and the Center for Computational Molecular Biology, Providence, Rhode Island; and the Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut.
Obstet Gynecol. 2014 Jun;123(6):1155-1161. doi: 10.1097/AOG.0000000000000293.
To identify candidate genes and genetic variants for preeclampsia using a bioinformatic approach to extract and organize genes and variants from the published literature.
Semantic data-mining and natural language processing were used to identify articles from the published literature meeting criteria for potential association with preeclampsia. Articles were manually reviewed by trained curators. Cluster analysis was used to aggregate the extracted genes into gene sets associated with preeclampsia or severe preeclampsia, early or late preeclampsia, maternal or fetal tissue sources, and concurrent conditions (ie, fetal growth restriction, gestational hypertension, or hemolysis, elevated liver enzymes, and low platelet count [HELLP]). Gene ontology was used to organize this large group of genes into ontology groups.
From more than 22 million records in PubMed, with 28,000 articles on preeclampsia, our data-mining tool identified 2,300 articles with potential genetic associations with preeclampsia-related phenotypes. After curation, 729 articles were "accepted" that contained "statistically significant" associations with 535 genes. We saw distinct segregation of these genes by severity and timing of preeclampsia, by maternal or fetal source, and with associated conditions (eg, gestational hypertension, fetal growth restriction, or HELLP syndrome).
The gene sets and ontology groups identified through our systematic literature curation indicate that preeclampsia represents several distinct phenotypes with distinct and overlapping maternal and fetal genetic contributions.
III.
通过生物信息学方法提取和组织已发表文献中的基因和变异,以确定子痫前期的候选基因和遗传变异。
语义数据挖掘和自然语言处理用于从已发表文献中识别符合与子痫前期潜在关联标准的文章。文章由经过培训的策展人进行人工审查。聚类分析用于将提取的基因聚集成与子痫前期或重度子痫前期、早发性或晚发性子痫前期、母体或胎儿组织来源以及并发疾病(即胎儿生长受限、妊娠高血压或溶血、肝酶升高和血小板计数降低[HELLP])相关的基因集。基因本体论用于将这一大组基因组织成本体论组。
从 PubMed 中超过 2200 万条记录和 28000 篇关于子痫前期的文章中,我们的挖掘工具确定了 2300 篇可能与子痫前期相关表型具有遗传关联的文章。经过策展,有 729 篇“被接受”的文章包含了与 535 个基因的“统计学上显著”关联。我们发现这些基因根据子痫前期的严重程度和发病时间、母体或胎儿来源以及相关疾病(如妊娠高血压、胎儿生长受限或 HELLP 综合征)存在明显的分离。
通过系统文献策展确定的基因集和本体论组表明,子痫前期代表了几种不同的表型,具有不同和重叠的母体和胎儿遗传贡献。
III。