Senseable City Lab, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, MA 02139.
China Future City Lab and Center for Real Estate, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, MA 02139.
Proc Natl Acad Sci U S A. 2019 Jul 30;116(31):15447-15452. doi: 10.1073/pnas.1903064116. Epub 2019 Jul 15.
Accessing high-resolution, timely socioeconomic data such as data on population, employment, and enterprise activity at the neighborhood level is critical for social scientists and policy makers to design and implement location-based policies. However, in many developing countries or cities, reliable local-scale socioeconomic data remain scarce. Here, we show an easily accessible and timely updated location attribute-restaurant-can be used to accurately predict a range of socioeconomic attributes of urban neighborhoods. We merge restaurant data from an online platform with 3 microdatasets for 9 Chinese cities. Using features extracted from restaurants, we train machine-learning models to estimate daytime and nighttime population, number of firms, and consumption level at various spatial resolutions. The trained model can explain 90 to 95% of the variation of those attributes across neighborhoods in the test dataset. We analyze the tradeoff between accuracy, spatial resolution, and number of training samples, as well as the heterogeneity of the predicted results across different spatial locations, demographics, and firm industries. Finally, we demonstrate the cross-city generality of this method by training the model in one city and then applying it directly to other cities. The transferability of this restaurant model can help bridge data gaps between cities, allowing all cities to enjoy big data and algorithm dividends.
获取高分辨率、及时的社会经济数据,如社区层面的人口、就业和企业活动数据,对于社会科学家和政策制定者设计和实施基于位置的政策至关重要。然而,在许多发展中国家或城市,可靠的本地尺度社会经济数据仍然稀缺。在这里,我们展示了一种易于访问和及时更新的位置属性——餐厅,可以用来准确预测城市社区的一系列社会经济属性。我们将来自在线平台的餐厅数据与 9 个中国城市的 3 个微观数据集进行了合并。我们使用从餐厅提取的特征,训练机器学习模型来估计不同空间分辨率的日间和夜间人口、企业数量和消费水平。在测试数据集中,经过训练的模型可以解释 90%至 95%的属性在社区之间的变化。我们分析了准确性、空间分辨率和训练样本数量之间的权衡,以及不同空间位置、人口统计学和企业行业的预测结果的异质性。最后,我们通过在一个城市训练模型,然后直接将其应用于其他城市,展示了这种方法的跨城市通用性。这种餐厅模型的可转移性可以帮助弥合城市之间的数据差距,使所有城市都能享受到大数据和算法红利。