基于EMD和CatBoost算法的改进时间序列模型——以大连市PM2.5预测为例

赵凌霄, 李智扬, 屈磊磊

南京林业大学学报(自然科学版) ›› 2024, Vol. 48 ›› Issue (3) : 268-274.

PDF(2848 KB)
PDF(2848 KB)
南京林业大学学报(自然科学版) ›› 2024, Vol. 48 ›› Issue (3) : 268-274. DOI: 10.12302/j.issn.1000-2006.202205005
研究论文

基于EMD和CatBoost算法的改进时间序列模型——以大连市PM2.5预测为例

作者信息 +

Improved time series models based on EMD and CatBoost algorithms: taking PM2.5 prediction of Dalian City as an example

Author information +
文章历史 +

摘要

【目的】 解决传统大气PM2.5浓度时序预测时精度较低问题,减少PM2.5时间序列的非线性、高噪声、不平稳与波动性对预测的影响,从而更精确地预测PM2.5浓度。【方法】 以2014年1月1日至2022年1月31日大连市雾霾天气时PM2.5数据为例,提出了经验模态分解(EMD)、分类提升 (CatBoost)、自回归综合移动平均模型(ARIMA)组合的混合机器学习时间序列模型,并与传统自回归模型(AR)、ARIMA,以及只加入EMD方法后的混合模型进行比较。【结果】 混合模型EMD-CatBoost-ARIMA较原始序列均方根误差(RMSE)改进20.76%,平均绝对值误差(MAE)改进17.40%,希尔不等系数(TIC)改进29.17%。【结论】 对于高熵值的重构序列,EMD分解方法和CatBoost算法能够显著提升PM2.5时间序列模型的预测性能。相比较传统时间序列模型,EMD-CatBoost-ARIMA模型对大气PM2.5浓度预测性能较高。

Abstract

【Objective】 The study aims to address the problem of low accuracy in traditional PM2.5 concentration time series prediction, and to reduce the impact of nonlinearity, high noise, instability and volatility on the prediction of PM2.5 time series, to predict PM2.5 concentration more accurately. 【Method】 The haze PM2.5 data of Dalian City from January 1, 2014 to January 31, 2022 was used as an example. In this study, a hybrid machine learning time series model with the combination of empirical modal decomposition (EMD), classification boosting (CatBoost) and autoregressive integrated moving average model (ARIMA) was proposed. It was compared with the traditional autoregressive model (AR), ARIMA and the hybrid model with only the EMD method. 【Result】 The hybrid model EMD-CatBoost-ARIMA improved the root mean square error (RMSE) of the original sequence by 20.76%, the mean absolute error (MAE) by 17.40%, and the theil inequality coefficient (TIC) by 29.17%. 【Conclusion】 For reconstructed sequences with high entropy values, the EMD decomposition method and CatBoost algorithm can significantly improve the prediction performance of PM2.5 time series models. Compared with the traditional time series models, the EMD-CatBoost-ARIMA model has higher performance in PM2.5 concentration prediction.

关键词

PM2.5浓度 / 经验模态分解(EMD) / 时间序列模型 / 混合模型 / CatBoost算法 / 机器学习 / 大连市

Key words

PM2.5 concentration / empirical modal decomposition(EDM) / time series model / hybrid model / CatBoost algorithm / machine learning / Dalian City

引用本文

导出引用
赵凌霄, 李智扬, 屈磊磊. 基于EMD和CatBoost算法的改进时间序列模型——以大连市PM2.5预测为例[J]. 南京林业大学学报(自然科学版). 2024, 48(3): 268-274 https://doi.org/10.12302/j.issn.1000-2006.202205005
ZHAO Lingxiao, LI Zhiyang, QU Leilei. Improved time series models based on EMD and CatBoost algorithms: taking PM2.5 prediction of Dalian City as an example[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2024, 48(3): 268-274 https://doi.org/10.12302/j.issn.1000-2006.202205005
中图分类号: X513   

参考文献

[1]
World health organization. Ambient air pollution:a global assessment of exposure and burden of disease[M]. Geneva: World Health Organization, 2016.
[2]
TAO R J, CAO W J, LI M H, et al. PM2.5 compromises antiviral immunity in influenza infection by inhibiting activation of NLRP3 inflammasome and expression of interferon-Β[J]. Mol Immunol, 2020, 125:178-186.DOI: 10.1016/j.molimm.2020.07.001.
[3]
DE MARCO A, AMOATEY P, KHANIABADI Y O, et al. Mortality and morbidity for cardiopulmonary diseases attributed to PM2.5 exposure in the metropolis of Rome,Italy[J]. Eur J Intern Med, 2018, 57:49-57.DOI: 10.1016/j.ejim.2018.07.027.
[4]
GUO L C, LV Z L, MA W J, et al. Contribution of heavy metals in PM2.5 to cardiovascular disease mortality risk,a case study in Guangzhou,China[J]. Chemosphere, 2022, 297:134102.DOI: 10.1016/j.chemosphere.2022.134102.
[5]
施婷婷, 王帅, 杨立娟, 等. 中国华东地区PM2.5浓度时空变化及与景观格局关联研究[J]. 遥感技术与应用, 2024, 39(2): 435-446.
SHI T T, WANG S, YANG L J, et al. The spatial-temporal change of PM2.5 concentration and its relationship with landscape pattern in East China[J]. Remote Sensing Technology and Application, 2024, 39(2): 435-446.DOI:10.11873/j.issn.1004-0323.2024.2.0435.
[6]
汪伟舵, 吴涛涛, 张子振. 基于ARIMA模型的杭州市PM2.5预测[J]. 哈尔滨师范大学自然科学学报, 2018, 34(3):49-55.
WANG W D, WU T T, ZHANG Z Z. Forecast of PM2.5 in Hangzhou based on ARIMA model[J]. Nat Sci J Harbin Norm Univ, 2018, 34(3):49-55.DOI: 10.3969/j.issn.1000-5617.2018.03.009.
[7]
CHYON F A, SUMAN M N H, FAHIM M R I, et al. Time series analysis and predicting COVID-19 affected patients by ARIMA model using machine learning[J]. J Virol Methods, 2022, 301:114433.DOI: 10.1016/j.jviromet.2021.114433.
[8]
杨茜雯, 朱萌. 基于ARIMA模型对扬州市PM2.5的分析和预测[J]. 黑龙江环境通报, 2022, 35(1):35-37,40.
YANG Q W, ZHU M. Analysis and prediction of PM2.5 in Yangzhou based on ARIMA model[J]. Heilongjiang Environ J, 2022, 35(1):35-37,40.DOI: 10.3969/j.issn.1674-263X.2022.01.012.
[9]
彭斯俊, 沈加超, 朱雪. 基于ARIMA模型的PM2.5预测[J]. 安全与环境工程, 2014, 21(6): 125-128.
PENG S J, SHEN J C, ZHU X, et al. Forecast of PM2.5 based on the ARIMA model[J]. Safety and Environ Engine, 2014, 21(6): 125-128. DOI: 10.13578/j.cnki.issn.1671-1556.2014.06.023.
[10]
严宙宁, 牟敬锋, 赵星, 等. 基于ARIMA模型的深圳市大气PM2.5浓度时间序列预测分析[J]. 现代预防医学, 2018, 45(2):220-223,242.
YAN Z N, MOU J F, ZHAO X, et al. The time series prediction of PM2.5 in Shenzhen based on ARIMA model[J]. Mod Prev Med, 2018, 45(2):220-223,242.
[11]
谢心庆, 郑薇, 开璇, 等. 基于时间序列和多元方法的乌鲁木齐PM2.5浓度分析[J]. 云南大学学报(自然科学版), 2016, 38(4):595-601.
XIE X Q, ZHENG W, KAI X, et al. An analysis of PM2.5 concentration based on time sequence and multivariate methods in Urumqi City[J]. J Yunnan Univ (Nat Sci Ed), 2016, 38(4):595-601.DOI: 10.7540/j.ynu.20150789.
[12]
余辉, 袁晶, 于旭耀, 等. 基于ARMAX的PM2.5小时浓度跟踪预测模型[J]. 天津大学学报(自然科学与工程技术版), 2017, 50(1):105-111.
YU H, YUAN J, YU X Y, et al. Tracking prediction model for PM2.5 hourly concentration based on ARMAX[J]. J Tianjin Univ (Sci Technol), 2017, 50(1):105-111.DOI: 10.11784/tdxbz201504033.
[13]
吴明晖, 张广洁, 金苍宏. 基于多模态信息融合的时间序列预测模型[J]. 计算机应用, 2022, 42(8):2326-2332.
WU M H, ZHANG G J, JIN C H. Time series prediction model based on multimodal information fusion[J]. J Comput Appl, 2022, 42(8):2326-2332.DOI: 10.11772/j.issn.1001-9081.2021061053.
[14]
何泽森. 移动APP日活跃用户量预测研究[D]. 杭州: 浙江工商大学, 2018.
HE Z S. Research on forecasting the mobile APP daily active user[D]. Hangzhou: Zhejiang Gongshang University, 2018.
[15]
WONGSATHAN R, CHANKHAM S. Improvement on PM-10 forecast by using hybrid ARIMAX and neural networks model for the summer season in Chiang Mai[J]. Procedia Comput Sci, 2016, 86:277-280.DOI: 10.1016/j.procs.2016.05.062.
[16]
ALADAG E. Forecasting of particulate matter with a hybrid ARIMA model based on wavelet transformation and seasonal adjustment[J]. Urban Clim, 2021, 39:100930.DOI: 10.1016/j.uclim.2021.100930.
[17]
张棋. 基于机器学习的中国气象干旱时空预测研究[D]. 郑州: 华北水利水电大学, 2021.
ZHANG Q. Study of meteorological drought spatiotemporal forecast methods in China based on machine learning[D]. Zhengzhou: North China University of Water Resources and Electric Power, 2021.
[18]
LU C G, ZHANG S A, XUE D, et al. Improved estimation of coalbed methane content using the revised estimate of depth and CatBoost algorithm:a case study from southern Sichuan basin,China[J]. Comput Geosci, 2022, 158:104973.DOI: 10.1016/j.cageo.2021.104973.
[19]
DING Y, CHEN Z Q, LU W F, et al. A CatBoost approach with wavelet decomposition to improve satellite-derived high-resolution PM2.5 estimates in Beijing-Tianjin-Hebei[J]. Atmos Environ, 2021, 249:118212.DOI: 10.1016/j.atmosenv.2021.118212.
[20]
OMIDVARNIA A, MESBAH M, PEDERSEN M, et al. Range entropy:a bridge between signal complexity and self-similarity[J]. Entropy (Basel), 2018, 20(12):962.DOI: 10.3390/e20120962.
[21]
王涯鑫, 李捷辉, 王健. 甲醇-柴油双燃料发动机甲醇泄漏故障预诊断研究[J]. 车用发动机, 2022(1):86-92.
WANG Y X, LI J H, WANG J. Pre-diagnosis of methanol leakage fault for methanol-diesel dual fuel engine[J]. Veh Engine, 2022(1):86-92.DOI: 10.3969/j.issn.1001-2222.2022.01.014.
[22]
ZHENG J M, HU M X, WANG C H, et al. Spatial patterns of residents’ daily activity space and its influencing factors based on the CatBoost model:a case study of Nanjing,China[J]. Front Archit Res, 2022, 11(6):1193-1204.DOI: 10.1016/j.foar.2022.04.003.
[23]
TAYLOR K E. Summarizing multiple aspects of model performance in a single diagram[J]. J Geophys Res, 2001, 106(D7):7183-7192.DOI: 10.1029/2000jd900719.

基金

辽宁省博士科研启动基金项目(2020-BS-216)
国家级大学生创新创业训练计划(202110158002)
辽宁省大学生创新创业训练计划(S202210158006)

编辑: 孟苗婧 郑琰燚
PDF(2848 KB)

Accesses

Citation

Detail

段落导航
相关文章

/