Department of Bioinformatics - BiGCaT, NUTRIM School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, The Netherlands.
Governor Kremers Centre - Rett Expertise Centre, Maastricht University Medical Center, Maastricht, The Netherlands.
Sci Data. 2021 May 4;8(1):124. doi: 10.1038/s41597-021-00905-y.
Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.
在这里,我们描述了一个数据集,其中包含有关具有已知遗传背景的单基因、罕见疾病的信息,并补充了有关疾病本身和潜在遗传原因发现的手动提取来源。我们收集了 4166 种罕见的单基因疾病,并将它们与 3163 个致病基因联系起来,这些基因都被标注了 OMIM 和 Ensembl 标识符以及 HGNC 符号。首次描述这些罕见疾病的科学出版物的 PubMed 标识符以及发现导致这些疾病的基因的出版物,是通过从 OMIM、PubMed、维基百科、whonamedit.com 和 Google Scholar 中获取信息添加的。这些数据在 CC0 许可下以电子表格的形式提供,以从 DisGeNET 修改后的语义模型中的 RDF 形式提供,并已添加到 Wikidata 中。该数据集依赖于具有 PubMed 标识符的公开可用数据和出版物,但通过我们努力使数据具有互操作性和链接性,我们现在可以分析这些数据。我们的分析揭示了罕见疾病和致病基因发现的时间线,并将它们与方法的发展联系起来。