Chowdhury Rup, Nur Fernaz Narin, Islam Muhammad Nazrul, Islam Md Nazmul, Das Prapti, Afridi Arafat Sahin
Department of Computer Science and Engineering, Military Institute of Science and Technology, Mirpur Cantonment, Dhaka, 1216, Bangladesh.
Department of Computer Science and Engineering, Notre Dame University Bangladesh, 2/A, Arambagh, Motijheel, Dhaka, 1000, Bangladesh.
Data Brief. 2025 May 30;61:111727. doi: 10.1016/j.dib.2025.111727. eCollection 2025 Aug.
Precision agriculture harnesses data-driven techniques to optimize crop production, resource use, and sustainability. However, low-income countries like Bangladesh face a shortage of localized, high-quality datasets that reflect regional agroclimatic conditions and cropping practices. To address this gap, we present SPAS-Dataset-BD, a robust dataset compiled through a hybrid approach: secondary extraction from the Bangladesh Bureau of Statistics (BBS) 2022 Yearbook and primary on-field surveys of 223 farmers across ten diverse districts. The dataset comprises 4191 records over 73 crop types, with 12 agronomic and environmental features, including underrepresented species. Robustness is demonstrated via threshold-based missing-value handling (<5 % deletion, targeted imputation), hash-based deduplication, and cross-validation against official statistics. We illustrate potential applications, in machine learning (73-class crop classification, yield forecasting) and IoT-driven irrigation scheduling. SPAS-Dataset-BD's scale, methodological transparency, and contextual richness make it a valuable resource for precision agriculture research and policy-making in South Asia.
精准农业利用数据驱动技术来优化作物生产、资源利用和可持续性。然而,像孟加拉国这样的低收入国家面临着反映区域农业气候条件和种植实践的本地化高质量数据集短缺的问题。为了填补这一空白,我们展示了SPAS-Dataset-BD,这是一个通过混合方法编制的强大数据集:从孟加拉国统计局(BBS)2022年年鉴中二次提取数据,并对十个不同地区的223名农民进行实地调查。该数据集包含73种作物类型的4191条记录,具有12个农艺和环境特征,包括代表性不足的物种。通过基于阈值的缺失值处理(<5%删除,有针对性的插补)、基于哈希的重复数据删除以及与官方统计数据的交叉验证来证明其稳健性。我们说明了在机器学习(73类作物分类、产量预测)和物联网驱动的灌溉调度中的潜在应用。SPAS-Dataset-BD的规模、方法透明度和背景丰富性使其成为南亚精准农业研究和政策制定的宝贵资源。