Tran Le-Thuy T, Brewster Philip J, Chidambaram Valliammai, Hurdle John F
Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT 84108, USA.
Nutrients. 2017 May 5;9(5):457. doi: 10.3390/nu9050457.
This study presents a method laying the groundwork for systematically monitoring food quality and the healthfulness of consumers' point-of-sale grocery purchases. The method automates the process of identifying United States Department of Agriculture (USDA) Food Patterns Equivalent Database (FPED) components of grocery food items. The input to the process is the compact abbreviated descriptions of food items that are similar to those appearing on the point-of-sale sales receipts of most food retailers. The FPED components of grocery food items are identified using Natural Language Processing techniques combined with a collection of food concept maps and relationships that are manually built using the USDA Food and Nutrient Database for Dietary Studies, the USDA National Nutrient Database for Standard Reference, the What We Eat In America food categories, and the hierarchical organization of food items used by many grocery stores. We have established the construct validity of the method using data from the National Health and Nutrition Examination Survey, but further evaluation of validity and reliability will require a large-scale reference standard with known grocery food quality measures. Here we evaluate the method's utility in identifying the FPED components of grocery food items available in a large sample of retail grocery sales data (~190 million transaction records).
本研究提出了一种方法,为系统监测食品质量以及消费者在杂货店购买食品的健康程度奠定了基础。该方法实现了识别美国农业部(USDA)食品模式等效数据库(FPED)中食品杂货项目组成部分的过程自动化。该过程的输入是食品项目的紧凑缩写描述,类似于大多数食品零售商销售点收据上出现的描述。使用自然语言处理技术,结合一系列食品概念图和关系来识别食品杂货项目的FPED组成部分,这些概念图和关系是使用美国农业部饮食研究食品和营养数据库、美国农业部标准参考营养数据库、“我们在美国吃什么”食品类别以及许多杂货店使用的食品项目层次结构手动构建的。我们使用来自国家健康和营养检查调查的数据建立了该方法的结构效度,但要进一步评估效度和信度,将需要一个具有已知食品杂货质量测量值的大规模参考标准。在此,我们评估该方法在识别大量零售食品杂货销售数据样本(约1.9亿条交易记录)中食品杂货项目的FPED组成部分方面的效用。