Urdang Jacqueline G, Masters Stephanie, Edokobi Nneoma, Mukherjee Chitra, Quazi Arnib, Liem David A, Ahrens Monica, Wang Xuan, Whitham Megan
Virginia Tech Carilion School of Medicine, Roanoke, Virginia, USA.
Carilion Clinic, Roanoke, Virginia, USA.
Physiol Rep. 2025 Mar;13(6):e70262. doi: 10.14814/phy2.70262.
This study aims to demonstrate that text phrase-mining and natural language processing (NLP) can annotate huge quantities of obstetrics textual data for the discovery and evaluation of maternal protein/gene (MPG)-disease interactions involved in the preeclampsia pathway. We employ a phrase-mining/NLP pipeline to evaluate unique MPGs involved in six cardiovascular derangements with overlapping presentations during pregnancy. The diseases were matched with Medical Subject Headings. A textual corpus was developed from abstracts matched to these terms through PubMed. Fourty-four MPGs were identified with respect to the diseases. Processing was performed, with unique scores for each MPG-disease pair. Components of the score were calculated and weighted for distinctness, integrity, and popularity. Statistical analyses were conducted for the examination of protein-disease relationships. Fourty-four MPGs with known associations to cardiovascular disease and preeclampsia pathways were identified among the 6 diseases. MPGs shared across the greatest number of disease states were implicated in: (1) angiogenesis and vasoconstriction, (2) hemodynamic regulation, (3) hormonal regulation of metabolism, and (4) inflammation. NLP and text phrase-mining are successfully applied to Obstetrics abstracts with accuracy and speed. This approach holds promise in synthesizing large volumes of data for presenting trends in the Obstetric literature and for the identification of promising biomarkers.
本研究旨在证明文本短语挖掘和自然语言处理(NLP)能够注释大量产科文本数据,以发现和评估子痫前期途径中涉及的母体蛋白/基因(MPG)-疾病相互作用。我们采用短语挖掘/NLP流程来评估妊娠期间具有重叠表现的六种心血管紊乱所涉及的独特MPG。这些疾病与医学主题词进行匹配。通过PubMed从与这些术语匹配的摘要中构建了一个文本语料库。针对这些疾病确定了44个MPG。进行了处理,为每个MPG-疾病对赋予了独特的分数。计算了分数的组成部分,并根据独特性、完整性和流行度进行加权。进行了统计分析以检查蛋白质-疾病关系。在这6种疾病中确定了44个与心血管疾病和子痫前期途径有已知关联的MPG。在最多疾病状态中共享的MPG涉及:(1)血管生成和血管收缩,(2)血流动力学调节,(3)代谢的激素调节,以及(4)炎症。NLP和文本短语挖掘已成功且准确快速地应用于产科摘要。这种方法在综合大量数据以呈现产科文献趋势和识别有前景的生物标志物方面具有前景。