
Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models
CHEN Jiankun, MU Fengyun, ZHANG Yongchuan, TIAN Tian, WANG Junxiu
JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2022, Vol. 46 ›› Issue (5) : 152-160.
Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models
【Objective】Comparative analysis of the XGBoost model, LightGBM model, random forest model (RF), K nearest neighbor model (KNN), long short-term memory neural network (LSTM), and the decision tree model (DT), a total of six PM2.5 concentration prediction models was undertaken to ensure accurate and timely prediction of the ambient PM2.5 concentration.【Method】Based on a full-year of air quality monitoring data and the meteorological data of Hechuan District, Chongqing City in 2020, the maximum minimum redundancy algorithm (MRMR) was used to reduce the data dimensionality to select the optimal feature subset, which is used as the model input. The PM2.5 concentration prediction was then undertaken one at a time. Considering that the PM2.5 concentration varies considerably during different seasons, the PM2.5 concentration was predicted according to season. This was undertaken to explore the prediction performance of each model and the running time and memory usage of each model were calculated. Based on the correlation between PM2.5 and the characteristic variables and the importance of the characteristic variables, the causes of the seasonal differences in model prediction performance are discussed.【Result】The overall prediction accuracy of the model is ranked as XGBoost, RF, LightGBM, LSTM, KNN, and the DT models. In terms of the prediction performance, the six models all show that the prediction accuracy in autumn and winter is higher than that of spring and summer. The LightGBM model can considerably reduce the training time and memory occupation of the model while ensuring the model accuracy. The importance of these features shows that the importance of PM10, temperature, and the air pressure is high, while the importance of O3, wind direction, and NO2 is relatively weak.【Conclusion】The optimal feature subset selected using the MRMR method for data dimensionality reduction can better predict the PM2.5 concentration. In comparison, the XGBoost, RF, LightGBM, and the LSTM models have higher performance in PM2.5 prediction, among them, the Light GBM has better comprehensive performance.
PM2.5 prediction / machine learning / maximum correlation minimum redundancy (MRMR) / meteorological factors
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
李锋, 朱彬, 安俊岭, 等. 2013年12月初长江三角洲及周边地区重霾污染的数值模拟[J]. 中国环境科学, 2015, 35(7):1965-1974.
|
[10] |
周广强, 谢英, 吴剑斌, 等. 基于WRF-Chem模式的华东区域PM2.5预报及偏差原因[J]. 中国环境科学, 2016, 36(8):2251-2259.
|
[11] |
|
[12] |
|
[13] |
|
[14] |
郑毅, 朱成璋. 基于深度信念网络的PM2.5预测[J]. 山东大学学报(工学版), 2014, 44(6):19-25.
|
[15] |
曲悦, 钱旭, 宋洪庆, 等. 基于机器学习的北京市PM2.5浓度预测模型及模拟分析[J]. 工程科学学报, 2019, 41(3):401-407.
|
[16] |
李建新, 刘小生, 刘静, 等. 基于MRMR-HK-SVM模型的PM2.5浓度预测[J]. 中国环境科学, 2019, 39(6):2304-2310.
|
[17] |
宋国君, 国潇丹, 杨啸, 等. 沈阳市PM2.5浓度ARIMA-SVM组合预测研究[J]. 中国环境科学, 2018, 38(11):4031-4039.
|
[18] |
康俊锋, 黄烈星, 张春艳, 等. 多机器学习模型下逐小时PM2.5预测及对比分析[J]. 中国环境科学, 2020, 40(5):1895-1905.
|
[19] |
|
[20] |
王黎明, 吴香华, 赵天良, 等. 基于距离相关系数和支持向量机回归的PM2.5浓度滚动统计预报方案[J]. 环境科学学报, 2017, 37(4):1268-1276.
|
[21] |
|
[22] |
王占山, 李云婷, 陈添, 等. 2013年北京市PM2.5的时空分布[J]. 地理学报, 2015, 70(1):110-120.
|
[23] |
郭立力, 赵春江. 十折交叉检验的支持向量机参数优化算法[J]. 计算机工程与应用, 2009, 45(8):55-57.
|
[24] |
|
[25] |
方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3):32-38.
|
[26] |
|
[27] |
桑应宾. 基于K近邻的分类算法研究[D]. 重庆: 重庆大学, 2009.
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
环境保护部. HJ 633-2012:环境空气质量指数(AQI)技术规定(试行)[EB/OL]. [2021-05-10]. http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/jcffbz/201203/W020120410332725219541.pdf.
|
[33] |
|
[34] |
曾昭亮, 郭建平, 马大喜. 基于江西地区多卫星数据的气溶胶立体分布研究[J]. 大气与环境光学学报, 2016, 11(5):391-400.
|
/
〈 |
|
〉 |