JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2022, Vol. 46 ›› Issue (5): 152-160.doi: 10.12302/j.issn.1000-2006.202106023

Previous Articles     Next Articles

Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models

CHEN Jiankun(), MU Fengyun(), ZHANG Yongchuan, TIAN Tian, WANG Junxiu   

  1. School of Smart City, Chongqing Jiaotong University, Chongqing 400074,China
  • Received:2021-06-19 Revised:2021-09-05 Online:2022-09-30 Published:2022-10-19
  • Contact: MU Fengyun E-mail:1561979759@qq.com;mfysd@cqjtu.edu.cn

Abstract:

【Objective】Comparative analysis of the XGBoost model, LightGBM model, random forest model (RF), K nearest neighbor model (KNN), long short-term memory neural network (LSTM), and the decision tree model (DT), a total of six PM2.5 concentration prediction models was undertaken to ensure accurate and timely prediction of the ambient PM2.5 concentration.【Method】Based on a full-year of air quality monitoring data and the meteorological data of Hechuan District, Chongqing City in 2020, the maximum minimum redundancy algorithm (MRMR) was used to reduce the data dimensionality to select the optimal feature subset, which is used as the model input. The PM2.5 concentration prediction was then undertaken one at a time. Considering that the PM2.5 concentration varies considerably during different seasons, the PM2.5 concentration was predicted according to season. This was undertaken to explore the prediction performance of each model and the running time and memory usage of each model were calculated. Based on the correlation between PM2.5 and the characteristic variables and the importance of the characteristic variables, the causes of the seasonal differences in model prediction performance are discussed.【Result】The overall prediction accuracy of the model is ranked as XGBoost, RF, LightGBM, LSTM, KNN, and the DT models. In terms of the prediction performance, the six models all show that the prediction accuracy in autumn and winter is higher than that of spring and summer. The LightGBM model can considerably reduce the training time and memory occupation of the model while ensuring the model accuracy. The importance of these features shows that the importance of PM10, temperature, and the air pressure is high, while the importance of O3, wind direction, and NO2 is relatively weak.【Conclusion】The optimal feature subset selected using the MRMR method for data dimensionality reduction can better predict the PM2.5 concentration. In comparison, the XGBoost, RF, LightGBM, and the LSTM models have higher performance in PM2.5 prediction, among them, the Light GBM has better comprehensive performance.

Key words: PM2.5 prediction, machine learning, maximum correlation minimum redundancy (MRMR), meteorological factors

CLC Number: