基于多机器学习模型的逐小时PM2.5浓度预测对比

陈建坤, 牟凤云, 张用川, 田甜, 王俊秀

南京林业大学学报(自然科学版) ›› 2022, Vol. 46 ›› Issue (5) : 152-160.

PDF(2819 KB)
PDF(2819 KB)
南京林业大学学报(自然科学版) ›› 2022, Vol. 46 ›› Issue (5) : 152-160. DOI: 10.12302/j.issn.1000-2006.202106023
研究论文

基于多机器学习模型的逐小时PM2.5浓度预测对比

作者信息 +

Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models

Author information +
文章历史 +

摘要

【目的】比较分析XGBoost模型、LightGBM模型、随机森林模型(RF)、K最近邻模型(KNN)、长短期记忆神经网络(LSTM)、决策树模型(DT)共6个PM2.5浓度预测模型,以准确、及时预测环境PM2.5浓度。【方法】基于重庆市合川区2020年全年空气质量监测数据和气象数据,通过最大相关最小冗余算法(MRMR)进行数据降维选择最优特征子集,作为模型的输入,逐一进行PM2.5浓度预测;考虑到不同季节PM2.5浓度差异较大,故分季节预测了PM2.5浓度;为了探究各模型预测性能,计算了各模型运行时间和内存占用,并基于PM2.5与特征变量的相关性和特征变量的重要性探讨了模型预测性能季节性差异原因。【结果】模型总体预测精度从高到低排序为 XGBoost、RF、LightGBM、LSTM、KNN、DT模型;预测性能方面,6个模型均表现为秋冬季节预测精度高于春夏季节;LightGBM模型可在保证模型精度的情况下,大幅减少模型训练时间和内存占用;特征重要性显示PM10浓度、气温和气压的重要性高,O3浓度、风向和NO2浓度重要性相对较弱。【结论】采取MRMR方法进行数据降维选取的最优特征子集能较好地预测PM2.5浓度;相比较而言,XGBoost、RF、LightGBM、LSTM模型在PM2.5浓度预测上具有较优性能,其中综合性能较好的为LightGBM模型。

Abstract

【Objective】Comparative analysis of the XGBoost model, LightGBM model, random forest model (RF), K nearest neighbor model (KNN), long short-term memory neural network (LSTM), and the decision tree model (DT), a total of six PM2.5 concentration prediction models was undertaken to ensure accurate and timely prediction of the ambient PM2.5 concentration.【Method】Based on a full-year of air quality monitoring data and the meteorological data of Hechuan District, Chongqing City in 2020, the maximum minimum redundancy algorithm (MRMR) was used to reduce the data dimensionality to select the optimal feature subset, which is used as the model input. The PM2.5 concentration prediction was then undertaken one at a time. Considering that the PM2.5 concentration varies considerably during different seasons, the PM2.5 concentration was predicted according to season. This was undertaken to explore the prediction performance of each model and the running time and memory usage of each model were calculated. Based on the correlation between PM2.5 and the characteristic variables and the importance of the characteristic variables, the causes of the seasonal differences in model prediction performance are discussed.【Result】The overall prediction accuracy of the model is ranked as XGBoost, RF, LightGBM, LSTM, KNN, and the DT models. In terms of the prediction performance, the six models all show that the prediction accuracy in autumn and winter is higher than that of spring and summer. The LightGBM model can considerably reduce the training time and memory occupation of the model while ensuring the model accuracy. The importance of these features shows that the importance of PM10, temperature, and the air pressure is high, while the importance of O3, wind direction, and NO2 is relatively weak.【Conclusion】The optimal feature subset selected using the MRMR method for data dimensionality reduction can better predict the PM2.5 concentration. In comparison, the XGBoost, RF, LightGBM, and the LSTM models have higher performance in PM2.5 prediction, among them, the Light GBM has better comprehensive performance.

关键词

PM2.5预测 / 机器学习 / 最大相关最小冗余(MRMR) / 气象因子

Key words

PM2.5 prediction / machine learning / maximum correlation minimum redundancy (MRMR) / meteorological factors

引用本文

导出引用
陈建坤, 牟凤云, 张用川, . 基于多机器学习模型的逐小时PM2.5浓度预测对比[J]. 南京林业大学学报(自然科学版). 2022, 46(5): 152-160 https://doi.org/10.12302/j.issn.1000-2006.202106023
CHEN Jiankun, MU Fengyun, ZHANG Yongchuan, et al. Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY. 2022, 46(5): 152-160 https://doi.org/10.12302/j.issn.1000-2006.202106023
中图分类号: X513   

参考文献

[1]
FUZZI S, BALTENSPERGER U, CARSLAW K, et al. Particulate matter,air quality and climate:lessons learned and future needs[J]. Atmos Chem Phys, 2015, 15(14):8217-8299.DOI:10.5194/acp-15-8217-2015.
[2]
CESARI D, DE BENEDETTO G E, BONASONI P, et al. Seasonal variability of PM2.5 and PM10 composition and sources in an urban background site in southern Italy[J]. Sci Total Environ, 2018, 612:202-213.DOI:10.1016/j.scitotenv.2017.08.230.
[3]
MANISALIDIS I, STAVROPOULOU E, STAVROPOULOS A, et al. Environmental and health impacts of air pollution:a review[J]. Front Public Health, 2020, 8:14.DOI:10.3389/fpubh.2020.00014.
[4]
KIM K H, KABIR E, KABIR S. A review on the human health impact of airborne particulate matter[J]. Environ Int, 2015, 74:136-143.DOI:10.1016/j.envint.2014.10.005.
[5]
LIU H Y, DUNEA D, IORDACHE S, et al. A review of airborne particulate matter effects on young children's respiratory symptoms and diseases[J]. Atmosphere, 2018, 9(4):150.DOI:10.3390/atmos9040150.
[6]
CHOI S, KIM K H, KIM K, et al. Association between post-diagnosis particulate matter exposure among 5-year cancer survivors and cardiovascular disease risk in three metropolitan areas from south Korea[J]. Int J Environ Res Public Health, 2020, 17(8):2841.DOI:10.3390/ijerph17082841.
[7]
ATKINSON R W, KANG S, ANDERSON H R, et al. Epidemiological time series studies of PM2.5 and daily mortality and hospital admissions:a systematic review and meta-analysis[J]. Thorax, 2014, 69(7):660-665.DOI:10.1136/thoraxjnl-2013-204492.
[8]
ZHANG Y, BOCQUET M, MALLET V, et al. Real-time air quality forecasting,part I: history,techniques,and current status[J]. Atmos Environ, 2012, 60:632-655.DOI:10.1016/j.atmosenv.2012.06.031.
[9]
李锋, 朱彬, 安俊岭, 等. 2013年12月初长江三角洲及周边地区重霾污染的数值模拟[J]. 中国环境科学, 2015, 35(7):1965-1974.
LI F, ZHU B, AN J L, et al. Modeling study of a severe haze episode occurred over the Yangtze River Delta and its surrounding regions during early December,2013[J]. China Environ Sci, 2015, 35(7):1965-1974.DOI:10.3969/j.issn.1000-6923.2015.07.008.
[10]
周广强, 谢英, 吴剑斌, 等. 基于WRF-Chem模式的华东区域PM2.5预报及偏差原因[J]. 中国环境科学, 2016, 36(8):2251-2259.
ZHOU G Q, XIE Y, WU J B, et al. WRF-Chem based PM2.5 forecast and bias analysis over the East China region[J]. China Environ Sci, 2016, 36(8):2251-2259.DOI:10.3969/j.issn.1000-6923.2016.08.002.
[11]
DENNIS R L, BYUN D W, NOVAK J H, et al. The next generation of integrated air quality modeling:EPA's models-3[J]. Atmos Environ, 1996, 30(12):1925-1938.DOI:10.1016/1352-2310(95)00174-3.
[12]
CHEN Q Q, TAYLOR D. Transboundary atmospheric pollution in southeast Asia:current methods,limitations and future developments[J]. Crit Rev Environ Sci Technol, 2018, 48(16/17/18):997-1029.DOI:10.1080/10643389.2018.1493337.
[13]
SHIMADERA H, KOJIMA T, KONDO A. Evaluation of air quality model performance for simulating long-range transport and local pollution of PM2.5 in Japan[J]. Adv Meteorol, 2016, 2016:5694251.DOI:10.1155/2016/5694251.
[14]
郑毅, 朱成璋. 基于深度信念网络的PM2.5预测[J]. 山东大学学报(工学版), 2014, 44(6):19-25.
ZHENG Y, ZHU C Z. A prediction method of atmospheric PM2.5 based on DBNs[J]. J Shandong Univ (Eng Sci), 2014, 44(6):19-25.DOI:10.6040/j.issn.1672-3961.1.2014.180.
[15]
曲悦, 钱旭, 宋洪庆, 等. 基于机器学习的北京市PM2.5浓度预测模型及模拟分析[J]. 工程科学学报, 2019, 41(3):401-407.
QU Y, QIAN X, SONG H Q, et al. Machine-learning-based model and simulation analysis of PM2.5 concentration prediction in Beijing[J]. Chin J Eng, 2019, 41(3):401-407.DOI:10.13374/j.issn2095-9389.2019.03.014.
[16]
李建新, 刘小生, 刘静, 等. 基于MRMR-HK-SVM模型的PM2.5浓度预测[J]. 中国环境科学, 2019, 39(6):2304-2310.
LI J X, LIU X S, LIU J, et al. Prediction of PM2.5 concentration based on MRMR-HK-SVM model[J]. China Environ Sci, 2019, 39(6):2304-2310. DOI:10.3969/j.issn.1000-6923.2019.06.009.
[17]
宋国君, 国潇丹, 杨啸, 等. 沈阳市PM2.5浓度ARIMA-SVM组合预测研究[J]. 中国环境科学, 2018, 38(11):4031-4039.
SONG G J, GUO X D, YANG X, et al. ARIMA-SVM combination prediction of PM2.5 concentration in Shenyang[J]. China Environ Sci, 2018, 38(11):4031-4039.DOI:10.3969/j.issn.1000-6923.2018.11.005.
[18]
康俊锋, 黄烈星, 张春艳, 等. 多机器学习模型下逐小时PM2.5预测及对比分析[J]. 中国环境科学, 2020, 40(5):1895-1905.
KANG J F, HUANG L X, ZHANG C Y, et al. Hourly PM2.5 prediction and its comparative analysis under multi-machine learning model[J]. China Environ Sci, 2020, 40(5):1895-1905.DOI:10.19674/j.cnki.issn1000-6923.2020.0213.
[19]
KAYES I, SHAHRIAR S A, HASAN K, et al. The relationships between meteorological parameters and air pollutants in an urban environment[J]. Glob J Environ Sci Manag, 2019, 5(3):265-278.DOI:10.22034/GJESM.2019.03.01.
[20]
王黎明, 吴香华, 赵天良, 等. 基于距离相关系数和支持向量机回归的PM2.5浓度滚动统计预报方案[J]. 环境科学学报, 2017, 37(4):1268-1276.
WANG L M, WU X H, ZHAO T L, et al. A scheme for rolling statistical forecasting of PM2.5 concentrations based on distance correlation coefficient and support vector regression[J]. Acta Sci Circumstantiae, 2017, 37(4):1268-1276.DOI:10.13671/j.hjkxxb.2016.0345.
[21]
JUHOS I, MAKRA L, TÓTH B. Forecasting of traffic origin NO and NO2 concentrations by support vector machines and neural networks using principal component analysis[J]. Simul Model Pract Theory, 2008, 16(9):1488-1502.DOI:10.1016/j.simpat.2008.08.006.
[22]
王占山, 李云婷, 陈添, 等. 2013年北京市PM2.5的时空分布[J]. 地理学报, 2015, 70(1):110-120.
WANG Z S, LI Y T, CHEN T, et al. Spatial-temporal characteristics of PM2.5 in Beijing in 2013[J]. Acta Geogr Sin, 2015, 70(1):110-120.DOI:10.11821/dlxb201501009.
[23]
郭立力, 赵春江. 十折交叉检验的支持向量机参数优化算法[J]. 计算机工程与应用, 2009, 45(8):55-57.
GUO L L, ZHAO C J. Optimizing parameters of support vector machine's model based on genetic algorithm[J]. Comput Eng Appl, 2009, 45(8):55-57.DOI:10.3778/j.issn.1002-8331.2009.08.017.
[24]
CHEN T, TONG H, BENESTY M. Xgboost: eXtreme gradient boosting[M]. London: Sage Publications, 2016:931-961.
[25]
方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3):32-38.
FANG K N, WU J B, ZHU J P, et al. A review of technologies on random forests[J]. Stat Inf Forum, 2011, 26(3):32-38.DOI:10.3969/j.issn.1007-3116.2011.03.006.
[26]
KE G, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceeding of the 13th interenational Conference Neural Information Processing Systems, New York: ACM, 2017:3149-3157.
[27]
桑应宾. 基于K近邻的分类算法研究[D]. 重庆: 重庆大学, 2009.
SANG Y B. Research of classification algorithm based on K nearest neighbor[D]. Chongqing: Chongqing University, 2009.
[28]
CARRERA-GARCÍA L, MUCHART J, LAZARO J J, et al. Pediatric SMA patients with complex spinal anatomy:implementation and evaluation of a decision-tree algorithm for administration of nusinersen[J]. Eur J Paediatr Neurol, 2021, 31:92-101.DOI:10.1016/j.ejpn.2021.02.009.
[29]
LI X, PENG L, YAO X J, et al. Long short-term memory neural network for air pollutant concentration predictions:method development and evaluation[J]. Environ Pollut, 2017, 231(Pt 1):997-1004.DOI:10.1016/j.envpol.2017.08.114.
[30]
PENG H C, LONG F H, DING C. Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min-redundancy[J]. IEEE Trans Pattern Anal Mach Intell, 2005, 27(8):1226-1238.DOI:10.1109/TPAMI.2005.159.
[31]
BAE JE, CHOI H, SHIN D W, et al. Fine particulate matter (PM2.5) inhibits ciliogenesis by increasing SPRR3 expression via c-Jun activation in RPE cells and skin keratinocytes[J]. Sci Rep, 2019, 9(1):3994.DOI:10.1038/s41598-019-40670-y.
[32]
环境保护部. HJ 633-2012:环境空气质量指数(AQI)技术规定(试行)[EB/OL]. [2021-05-10]. http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/jcffbz/201203/W020120410332725219541.pdf.
[33]
MA X Y, JIA H L, SHA T, et al. Spatial and seasonal characteristics of particulate matter and gaseous pollution in China: implications for control policy[J]. Environ Pollut, 2019, 248:421-428.DOI:10.1016/j.envpol.2019.02.038.
[34]
曾昭亮, 郭建平, 马大喜. 基于江西地区多卫星数据的气溶胶立体分布研究[J]. 大气与环境光学学报, 2016, 11(5):391-400.
ZENG Z L, GUO J P, MA D X. Research of aerosol three-dimensional distribution based on multi-satellite data over Jiangxi[J]. J Atmos Environ Opt, 2016, 11(5):391-400.DOI:10.3969/j.issn.1673-6141.2016.05.007.

基金

国家重点研发计划(2019YFB2102503)
重庆市自然科学基金项目(cstc2019jcyj-msxmX0626)

编辑: 郑琰燚
PDF(2819 KB)

Accesses

Citation

Detail

段落导航
相关文章

/