Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.
The European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, SD, CB10 1, UK.
BMC Bioinformatics. 2017 Dec 21;18(Suppl 17):561. doi: 10.1186/s12859-017-1978-0.
Cell lines and cell types are extensively studied in biomedical research yielding to a significant amount of publications each year. Identifying cell lines and cell types precisely in publications is crucial for science reproducibility and knowledge integration. There are efforts for standardisation of the cell nomenclature based on ontology development to support FAIR principles of the cell knowledge. However, it is important to analyse the usage of cell nomenclature in publications at a large scale for understanding the level of uptake of cell nomenclature in literature by scientists. In this study, we analyse the usage of cell nomenclature, both in Vivo, and in Vitro in biomedical literature by using text mining methods and present our results.
We identified 59% of the cell type classes in the Cell Ontology and 13% of the cell line classes in the Cell Line Ontology in the literature. Our analysis showed that cell line nomenclature is much more ambiguous compared to the cell type nomenclature. However, trends indicate that standardised nomenclature for cell lines and cell types are being increasingly used in publications by the scientists.
Our findings provide an insight to understand how experimental cells are described in publications and may allow for an improved standardisation of cell type and cell line nomenclature as well as can be utilised to develop efficient text mining applications on cell types and cell lines. All data generated in this study is available at https://github.com/shenay/CellNomenclatureStudy.
细胞系和细胞类型在生物医学研究中被广泛研究,每年都会产生大量的出版物。在出版物中准确识别细胞系和细胞类型对于科学可重复性和知识整合至关重要。已经有基于本体开发的细胞命名标准化工作,以支持细胞知识的 FAIR 原则。然而,重要的是要在大规模的出版物中分析细胞命名的使用情况,以了解科学家在文献中对细胞命名的接受程度。在这项研究中,我们使用文本挖掘方法分析了生物医学文献中体内和体外的细胞命名使用情况,并展示了我们的结果。
我们在文献中识别出了细胞本体论中的 59%的细胞类型类和细胞系本体论中的 13%的细胞系类。我们的分析表明,与细胞类型命名相比,细胞系命名更加模糊。然而,趋势表明,科学家越来越多地在出版物中使用标准化的细胞系和细胞类型命名。
我们的发现提供了对出版物中实验细胞描述方式的深入了解,并可能有助于细胞类型和细胞系命名的标准化改进,也可以用于开发针对细胞类型和细胞系的高效文本挖掘应用。本研究中生成的所有数据均可在 https://github.com/shenay/CellNomenclatureStudy 上获得。