Deng Lin
Department of Civil and Environmental Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
PLoS One. 2025 May 19;20(5):e0321951. doi: 10.1371/journal.pone.0321951. eCollection 2025.
The automated valuation model (AVM) has been widely used by real estate stakeholders to provide accurate property value estimations automatically. Traditional valuation models are subjective and inaccurate, and previous studies have shown that machine learning (ML) approaches perform better in real estate valuation. These valuation models are based on structured tabular data, and few consider integrating multi-source unstructured data such as images. Most previous studies use fixed feature space for model training without considering the model performance variation brought by various feature configuration parameters. To fill these gaps, this study uses Hong Kong as a case study and proposes an enhanced ML-based real estate valuation framework with feature configuration and multi-source image data fusion, including exterior housing photos, street view and remote sensing images. Eight ML regressors, namely, Random Forest, Extra Tree, XGBoost, Light Gradient Boosting Machine (LightGBM), K-Nearest Neighbors (KNN), Support Vector Regression (SVR), Multilayer Perceptron (MLP), and Multiple Linear Regression (MLR) are used to formulate ML pipelines for training. The SHapley Additive exPlanations (SHAP) method is used to examine the effects of images on housing prices. The experimental results show that the model performances using different feature configuration parameters are significantly different, indicating the necessity of feature configuration to obtain more accurate and reliable predictions. Extra Tree performs significantly better than other models. Half of the top 10 significant features are image features, and incorporating multi-source image features can improve property valuation accuracy. Nonlinear associations exist between image features and housing prices, and the spatial distribution patterns of image feature values and corresponding SHAP main effects vary significantly from the city centre to the suburbs. These findings contribute to a better understanding of AVM development with image fusion and the nonlinear associations between image features and housing prices for public authorities, urban planners, and real estate developers.
自动估值模型(AVM)已被房地产利益相关者广泛使用,以自动提供准确的房产价值估计。传统估值模型主观且不准确,先前的研究表明,机器学习(ML)方法在房地产估值中表现更好。这些估值模型基于结构化表格数据,很少考虑整合多源非结构化数据,如图像。以前的大多数研究在模型训练时使用固定的特征空间,而没有考虑各种特征配置参数带来的模型性能变化。为了填补这些空白,本研究以香港为例,提出了一个基于ML的增强型房地产估值框架,该框架具有特征配置和多源图像数据融合功能,包括房屋外部照片、街景和遥感图像。使用八种ML回归器,即随机森林、极端随机树、XGBoost、轻梯度提升机(LightGBM)、K近邻(KNN)、支持向量回归(SVR)、多层感知器(MLP)和多元线性回归(MLR)来构建用于训练的ML管道。使用SHapley值相加解释(SHAP)方法来检验图像对房价的影响。实验结果表明,使用不同特征配置参数的模型性能存在显著差异,这表明特征配置对于获得更准确可靠的预测是必要的。极端随机树的表现明显优于其他模型。前10个重要特征中有一半是图像特征,纳入多源图像特征可以提高房产估值的准确性。图像特征与房价之间存在非线性关联,并且图像特征值的空间分布模式和相应的SHAP主要效应从市中心到郊区有显著差异。这些发现有助于公共当局、城市规划者和房地产开发商更好地理解融合图像的AVM发展以及图像特征与房价之间的非线性关联。