JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2024, Vol. 48 ›› Issue (3): 268-274.doi: 10.12302/j.issn.1000-2006.202205005

Previous Articles     Next Articles

Improved time series models based on EMD and CatBoost algorithms: taking PM2.5 prediction of Dalian City as an example

ZHAO Lingxiao1,2(), LI Zhiyang3, QU Leilei4,*()   

  1. 1. College of Marine and Civil Engineering, Dalian Ocean University, Dalian 116023, China
    2. Department of Atmospheric and Oceanic Sciences, Fudan University, Shanghai 200438, China
    3. College of Civil Engineering, Chongqing University, Chongqing 400044, China
    4. College of Information Engineering, Dalian Ocean University, Dalian 116023, China
  • Received:2022-05-04 Revised:2023-08-11 Online:2024-05-30 Published:2024-06-14

Abstract:

【Objective】 The study aims to address the problem of low accuracy in traditional PM2.5 concentration time series prediction, and to reduce the impact of nonlinearity, high noise, instability and volatility on the prediction of PM2.5 time series, to predict PM2.5 concentration more accurately. 【Method】 The haze PM2.5 data of Dalian City from January 1, 2014 to January 31, 2022 was used as an example. In this study, a hybrid machine learning time series model with the combination of empirical modal decomposition (EMD), classification boosting (CatBoost) and autoregressive integrated moving average model (ARIMA) was proposed. It was compared with the traditional autoregressive model (AR), ARIMA and the hybrid model with only the EMD method. 【Result】 The hybrid model EMD-CatBoost-ARIMA improved the root mean square error (RMSE) of the original sequence by 20.76%, the mean absolute error (MAE) by 17.40%, and the theil inequality coefficient (TIC) by 29.17%. 【Conclusion】 For reconstructed sequences with high entropy values, the EMD decomposition method and CatBoost algorithm can significantly improve the prediction performance of PM2.5 time series models. Compared with the traditional time series models, the EMD-CatBoost-ARIMA model has higher performance in PM2.5 concentration prediction.

Key words: PM2.5 concentration, empirical modal decomposition(EDM), time series model, hybrid model, CatBoost algorithm, machine learning, Dalian City

CLC Number: