Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
The First Affiliated Hospital, Zhejiang University School of Medicine; Institute of Hematology, Zhejiang University, Hangzhou, 310058, China.
Sci Data. 2023 Dec 1;10(1):851. doi: 10.1038/s41597-023-02781-0.
Human aging is a natural and inevitable biological process that leads to an increased risk of aging-related diseases. Developing anti-aging therapies for aging-related diseases requires a comprehensive understanding of the mechanisms and effects of aging and longevity from a multi-modal and multi-faceted perspective. However, most of the relevant knowledge is scattered in the biomedical literature, the volume of which reached 36 million in PubMed. Here, we presented HALD, a text mining-based human aging and longevity dataset of the biomedical knowledge graph from all published literature related to human aging and longevity in PubMed. HALD integrated multiple state-of-the-art natural language processing (NLP) techniques to improve the accuracy and coverage of the knowledge graph for precision gerontology and geroscience analyses. Up to September 2023, HALD had contained 12,227 entities in 10 types (gene, RNA, protein, carbohydrate, lipid, peptide, pharmaceutical preparations, toxin, mutation, and disease), 115,522 relations, 1,855 aging biomarkers, and 525 longevity biomarkers from 339,918 biomedical articles in PubMed. HALD is available at https://bis.zju.edu.cn/hald .
人类衰老过程是一个自然且不可避免的生物学过程,这会导致与衰老相关疾病的风险增加。开发针对与衰老相关疾病的抗衰老疗法,需要从多模态和多方面的角度全面了解衰老和长寿的机制和影响。然而,大多数相关知识分散在生物医学文献中,PubMed 中的生物医学文献的数量达到了 3600 万篇。在这里,我们提出了 HALD,这是一个基于文本挖掘的生物医学知识图谱的人类衰老和长寿数据集,涵盖了 PubMed 中所有与人类衰老和长寿相关的已发表文献。HALD 集成了多种最先进的自然语言处理(NLP)技术,以提高知识图谱的准确性和覆盖范围,用于精准老年医学和老年科学分析。截至 2023 年 9 月,HALD 包含了 12227 个实体,10 种类型(基因、RNA、蛋白质、碳水化合物、脂质、肽、药物制剂、毒素、突变和疾病),115522 种关系,1855 种衰老生物标志物和 525 种长寿生物标志物,这些数据均来源于 PubMed 中的 339918 篇生物医学文章。HALD 可在 https://bis.zju.edu.cn/hald 上获取。