Lin Duan-yi, Zhou Chang-en, Lai Xin-mei, Yang Shu-jing
Institute of Information Management of Fujian University of Traditional Chinese Medicine, Fuzhou 350003, China.
Zhongguo Zhong Yao Za Zhi. 2008 Sep;33(17):2094-6.
Scientific data is the source of innovation in knowledge. In order to change the situation that there is few information in plenty of data and to obtain useful knowledge which has high information content, it is necessary to clean data and ensure data's accuracy and without noise off when database is established initially. High-quality data comes from high-quality data source. But incomplete and incorrect and irregular data exist widely in the data source of Chinese materia medica. The phenomenon of synonyms and homonym is quite serious, and there is no unified description for the name and origin of Chinese materia medica among different data sources. So data processing including data analysis and research is very important in the establishment of Chinese materia medica database. In order to get the most accurate and standard data, this paper analyzed the items of Medical Plants in Xiandai Bencao Gangmu, including classification analysis of medical plants: distribution analysis of different classes and analysis of medical part; analysis of synonyms and homonym; analysis of incorrect data and analysis of advantage and disadvantage of data sources.
科学数据是知识创新的源泉。为了改变大量数据中信息匮乏的状况,并获取具有高信息含量的有用知识,在初始建立数据库时,有必要对数据进行清理并确保数据的准确性和无噪声。高质量的数据来自高质量的数据源。但中药数据源中存在大量不完整、不正确和不规则的数据。同义词和同音异义词现象相当严重,不同数据源之间对中药的名称和产地没有统一的描述。因此,包括数据分析和研究在内的数据处理在中药数据库的建立中非常重要。为了获得最准确和规范的数据,本文分析了《现代本草纲目》中药用植物的条目,包括药用植物的分类分析:不同类别的分布分析和药用部位分析;同义词和同音异义词分析;错误数据的分析以及数据源优缺点的分析。