Tian Dingcheng, Zhou Cui, Wang Yu, Zhang Ruyi, Yao Yudong
College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110819, China.
Research Institute for Medical and Biological Engineering, Ningbo University, Ningbo, 315211, China.
Data Brief. 2024 Apr 10;54:110405. doi: 10.1016/j.dib.2024.110405. eCollection 2024 Jun.
Chinese herbal medicine (CHM) is integral to a traditional Chinese medicine (TCM) system. Accurately identifying Chinese herbal medicine is crucial for quality control and prescription compounding verification. However, with many Chinese herbal medicines and some with similar appearances but different therapeutic effects, achieving precise identification is a challenging task. Traditional manual identification methods have certain limitations, including labor-intensive, inefficient. Deep learning techniques for Chinese herbal medicine identification can enhance accuracy, improve efficiency and lower coats. However, few high-quality Chinese herbal medicine datasets are currently available for deep learning applications. To alleviate this problem, this study constructed a dataset (Dataset 1) containing 3,384 images of 20 common Chinese herbal medicine fruits through web crawling. All images are annotated by TCM experts, making them suitable for training and testing Chinese herbal medicine identification methods. Furthermore, this study establishes another dataset (Dataset 2) of 400 images by taking pictures using smartphones to provide materials for the practical efficacy evaluation of Chinese herbal medicine identification methods. The two datasets form a Ningbo Traditional Chinese Medicine Chinese Herb Medicine (NB-TCM-CHM) Dataset. In Dataset 1 and Dataset 2, each type of Chinese medicine herb is stored in a separate folder, with the folder named after its name. The dataset can be used to develop Chinese herbal medicine identification algorithms based on deep learning and evaluate the performance of Chinese herbal medicine identification methods.
中草药是中医体系的重要组成部分。准确识别中草药对于质量控制和处方配药验证至关重要。然而,中草药种类繁多,有些外观相似但治疗效果不同,实现精确识别是一项具有挑战性的任务。传统的人工识别方法存在一定局限性,包括劳动强度大、效率低。用于中草药识别的深度学习技术可以提高准确性、提升效率并降低成本。然而,目前很少有高质量的中草药数据集可用于深度学习应用。为缓解这一问题,本研究通过网络爬虫构建了一个包含20种常见中草药果实的3384张图像的数据集(数据集1)。所有图像均由中医专家标注,适用于训练和测试中草药识别方法。此外,本研究通过使用智能手机拍照建立了另一个包含400张图像的数据集(数据集2),为中草药识别方法的实际功效评估提供素材。这两个数据集构成了宁波中医药中草药(NB-TCM-CHM)数据集。在数据集1和数据集2中,每种中草药都存储在一个单独的文件夹中,文件夹以其名称命名。该数据集可用于开发基于深度学习的中草药识别算法,并评估中草药识别方法的性能。