College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.
Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, Inner Mongolia, China.
PeerJ. 2024 Oct 3;12:e18202. doi: 10.7717/peerj.18202. eCollection 2024.
Potato is the fourth largest food crop in the world, but potato cultivation faces serious threats from various diseases and pests. Despite significant advancements in research on potato disease resistance, these findings are scattered across numerous publications. For researchers, obtaining relevant knowledge by reading and organizing a large body of literature is a time-consuming and labor-intensive process. Therefore, systematically extracting and organizing the relationships between potato genes and diseases from the literature to establish a potato gene-disease knowledge base is particularly important. Unfortunately, there is currently no such gene-disease knowledge base available.
In this study, we constructed a Potato Gene-Disease Knowledge Base (PotatoG-DKB) using natural language processing techniques and large language models. We used PubMed as the data source and obtained 2,906 article abstracts related to potato biology, extracted entities and relationships between potato genes and related disease, and stored them in a Neo4j database. Using web technology, we also constructed the Potato Gene-Disease Knowledge Portal (PotatoG-DKP), an interactive visualization platform.
PotatoG-DKB encompasses 22 entity types (such as genes, diseases, species, .) of 5,206 nodes and 9,443 edges between entities (for example, gene-disease, pathogen-disease, .). PotatoG-DKP can intuitively display associative relationships extracted from literature and is a powerful assistant for potato biologists and breeders to understand potato pathogenesis and disease resistance. More details about PotatoG-DKP can be obtained at https://www.potatogd.com.cn/.
马铃薯是世界第四大粮食作物,但马铃薯种植面临着各种病虫害的严重威胁。尽管在研究马铃薯抗病性方面取得了重大进展,但这些发现分散在众多出版物中。对于研究人员来说,通过阅读和组织大量文献来获取相关知识是一个耗时且劳动密集的过程。因此,系统地从文献中提取和组织马铃薯基因与疾病之间的关系,建立马铃薯基因-疾病知识库尤为重要。然而,目前尚无此类基因-疾病知识库。
本研究使用自然语言处理技术和大型语言模型构建了马铃薯基因-疾病知识库(PotatoG-DKB)。我们以 PubMed 为数据源,获取了 2906 篇与马铃薯生物学相关的文章摘要,从中提取了马铃薯基因与相关疾病之间的实体和关系,并将其存储在 Neo4j 数据库中。我们还使用 Web 技术构建了马铃薯基因-疾病知识门户(PotatoG-DKP),这是一个交互式可视化平台。
PotatoG-DKB 包含了 22 种实体类型(如基因、疾病、物种等),共 5206 个节点和 9443 条实体之间的关系(如基因-疾病、病原体-疾病等)。PotatoG-DKP 可以直观地显示从文献中提取的关联关系,是马铃薯生物学家和育种家了解马铃薯发病机制和抗病性的有力助手。更多关于 PotatoG-DKP 的详细信息可在 https://www.potatogd.com.cn/ 获得。