Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models

CHEN Jiankun; MU Fengyun; ZHANG Yongchuan; TIAN Tian; WANG Junxiu

doi:10.12302/j.issn.1000-2006.202106023

Comparative analysis of hourly PM_2.5 prediction based on multiple machine learning models

CHEN Jiankun, MU Fengyun, ZHANG Yongchuan, TIAN Tian, WANG Junxiu

Journal of Nanjing Forestry University (Natural Sciences Edition） ›› 2022, Vol. 46 ›› Issue (5) : 152-160.

PDF(2819 KB)

Journal of Nanjing Forestry University (Natural Sciences Edition） ›› 2022, Vol. 46 ›› Issue (5) : 152-160. DOI: 10.12302/j.issn.1000-2006.202106023

Comparative analysis of hourly PM_2.5 prediction based on multiple machine learning models

Author information +

History +

Abstract

【Objective】Comparative analysis of the XGBoost model, LightGBM model, random forest model (RF), K nearest neighbor model (KNN), long short-term memory neural network (LSTM), and the decision tree model (DT), a total of six PM_2.5 concentration prediction models was undertaken to ensure accurate and timely prediction of the ambient PM_2.5 concentration.【Method】Based on a full-year of air quality monitoring data and the meteorological data of Hechuan District, Chongqing City in 2020, the maximum minimum redundancy algorithm (MRMR) was used to reduce the data dimensionality to select the optimal feature subset, which is used as the model input. The PM_2.5 concentration prediction was then undertaken one at a time. Considering that the PM_2.5 concentration varies considerably during different seasons, the PM_2.5 concentration was predicted according to season. This was undertaken to explore the prediction performance of each model and the running time and memory usage of each model were calculated. Based on the correlation between PM_2.5 and the characteristic variables and the importance of the characteristic variables, the causes of the seasonal differences in model prediction performance are discussed.【Result】The overall prediction accuracy of the model is ranked as XGBoost, RF, LightGBM, LSTM, KNN, and the DT models. In terms of the prediction performance, the six models all show that the prediction accuracy in autumn and winter is higher than that of spring and summer. The LightGBM model can considerably reduce the training time and memory occupation of the model while ensuring the model accuracy. The importance of these features shows that the importance of PM₁₀, temperature, and the air pressure is high, while the importance of O₃, wind direction, and NO₂ is relatively weak.【Conclusion】The optimal feature subset selected using the MRMR method for data dimensionality reduction can better predict the PM_2.5 concentration. In comparison, the XGBoost, RF, LightGBM, and the LSTM models have higher performance in PM_2.5 prediction, among them, the Light GBM has better comprehensive performance.

Key words

PM_2.5 prediction / machine learning / maximum correlation minimum redundancy (MRMR) / meteorological factors

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

CHEN Jiankun , MU Fengyun , ZHANG Yongchuan , et al . Comparative analysis of hourly PM_2.5 prediction based on multiple machine learning models[J]. Journal of Nanjing Forestry University (Natural Sciences Edition）. 2022, 46(5): 152-160 https://doi.org/10.12302/j.issn.1000-2006.202106023

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]	FUZZI S, BALTENSPERGER U, CARSLAW K, et al. Particulate matter,air quality and climate:lessons learned and future needs[J]. Atmos Chem Phys, 2015, 15(14):8217-8299.DOI:10.5194/acp-15-8217-2015. Cited in this article [1]

[2]	CESARI D, DE BENEDETTO G E, BONASONI P, et al. Seasonal variability of PM_2.5 and PM₁₀ composition and sources in an urban background site in southern Italy[J]. Sci Total Environ, 2018, 612:202-213.DOI:10.1016/j.scitotenv.2017.08.230. Cited in this article [1]

[3]	MANISALIDIS I, STAVROPOULOU E, STAVROPOULOS A, et al. Environmental and health impacts of air pollution:a review[J]. Front Public Health, 2020, 8:14.DOI:10.3389/fpubh.2020.00014. Cited in this article [1]

[4]	KIM K H, KABIR E, KABIR S. A review on the human health impact of airborne particulate matter[J]. Environ Int, 2015, 74:136-143.DOI:10.1016/j.envint.2014.10.005. Cited in this article [1]

[5]	LIU H Y, DUNEA D, IORDACHE S, et al. A review of airborne particulate matter effects on young children's respiratory symptoms and diseases[J]. Atmosphere, 2018, 9(4):150.DOI:10.3390/atmos9040150. Cited in this article [1]

[6]

CHOI

, KIM

K H

, KIM

, et al. Association between post-diagnosis particulate matter exposure among 5-year cancer survivors and cardiovascular disease risk in three metropolitan areas from south Korea[J]. Int J Environ Res Public Health, 2020, 17(8):2841.DOI:10.3390/ijerph17082841.

Cited in this article [1]

[7]	ATKINSON R W, KANG S, ANDERSON H R, et al. Epidemiological time series studies of PM_2.5 and daily mortality and hospital admissions:a systematic review and meta-analysis[J]. Thorax, 2014, 69(7):660-665.DOI:10.1136/thoraxjnl-2013-204492. Cited in this article [1]

[8]	ZHANG Y, BOCQUET M, MALLET V, et al. Real-time air quality forecasting,part I: history,techniques,and current status[J]. Atmos Environ, 2012, 60:632-655.DOI:10.1016/j.atmosenv.2012.06.031. Cited in this article [1]

[9]

李锋, 朱彬, 安俊岭, 等. 2013年12月初长江三角洲及周边地区重霾污染的数值模拟[J]. 中国环境科学, 2015, 35(7):1965-1974.

, ZHU

, AN

J L

, et al. Modeling study of a severe haze episode occurred over the Yangtze River Delta and its surrounding regions during early December,2013[J]. China Environ Sci, 2015, 35(7):1965-1974.DOI:10.3969/j.issn.1000-6923.2015.07.008.

Cited in this article [1]

[10]

周广强, 谢英, 吴剑斌, 等. 基于WRF-Chem模式的华东区域PM_2.5预报及偏差原因[J]. 中国环境科学, 2016, 36(8):2251-2259.

ZHOU

G Q

, XIE

, WU

J B

, et al. WRF-Chem based PM_2.5 forecast and bias analysis over the East China region[J]. China Environ Sci, 2016, 36(8):2251-2259.DOI:10.3969/j.issn.1000-6923.2016.08.002.

Cited in this article [1]

[11]	DENNIS R L, BYUN D W, NOVAK J H, et al. The next generation of integrated air quality modeling:EPA's models-3[J]. Atmos Environ, 1996, 30(12):1925-1938.DOI:10.1016/1352-2310(95)00174-3. Cited in this article [1]

[12]	CHEN Q Q, TAYLOR D. Transboundary atmospheric pollution in southeast Asia:current methods,limitations and future developments[J]. Crit Rev Environ Sci Technol, 2018, 48(16/17/18):997-1029.DOI:10.1080/10643389.2018.1493337. Cited in this article [1]

[13]	SHIMADERA H, KOJIMA T, KONDO A. Evaluation of air quality model performance for simulating long-range transport and local pollution of PM_2.5 in Japan[J]. Adv Meteorol, 2016, 2016:5694251.DOI:10.1155/2016/5694251. Cited in this article [1]

[14]	郑毅, 朱成璋. 基于深度信念网络的PM_2.5预测[J]. 山东大学学报(工学版), 2014, 44(6):19-25. ZHENG Y, ZHU C Z. A prediction method of atmospheric PM_2.5 based on DBNs[J]. J Shandong Univ (Eng Sci), 2014, 44(6):19-25.DOI:10.6040/j.issn.1672-3961.1.2014.180. Cited in this article [1]

[15]

曲悦, 钱旭, 宋洪庆, 等. 基于机器学习的北京市PM_2.5浓度预测模型及模拟分析[J]. 工程科学学报, 2019, 41(3):401-407.

, QIAN

, SONG

H Q

, et al. Machine-learning-based model and simulation analysis of PM_2.5 concentration prediction in Beijing[J]. Chin J Eng, 2019, 41(3):401-407.DOI:10.13374/j.issn2095-9389.2019.03.014.

Cited in this article [1]

[16]	李建新, 刘小生, 刘静, 等. 基于MRMR-HK-SVM模型的PM_2.5浓度预测[J]. 中国环境科学, 2019, 39(6):2304-2310. LI J X, LIU X S, LIU J, et al. Prediction of PM_2.5 concentration based on MRMR-HK-SVM model[J]. China Environ Sci, 2019, 39(6):2304-2310. DOI:10.3969/j.issn.1000-6923.2019.06.009. Cited in this article [1]

[17]	宋国君, 国潇丹, 杨啸, 等. 沈阳市PM_2.5浓度ARIMA-SVM组合预测研究[J]. 中国环境科学, 2018, 38(11):4031-4039. SONG G J, GUO X D, YANG X, et al. ARIMA-SVM combination prediction of PM_2.5 concentration in Shenyang[J]. China Environ Sci, 2018, 38(11):4031-4039.DOI:10.3969/j.issn.1000-6923.2018.11.005. Cited in this article [1]

[18]

康俊锋, 黄烈星, 张春艳, 等. 多机器学习模型下逐小时PM_2.5预测及对比分析[J]. 中国环境科学, 2020, 40(5):1895-1905.

KANG

J F

, HUANG

L X

, ZHANG

C Y

, et al. Hourly PM_2.5 prediction and its comparative analysis under multi-machine learning model[J]. China Environ Sci, 2020, 40(5):1895-1905.DOI:10.19674/j.cnki.issn1000-6923.2020.0213.

Cited in this article [1]

[19]	KAYES I, SHAHRIAR S A, HASAN K, et al. The relationships between meteorological parameters and air pollutants in an urban environment[J]. Glob J Environ Sci Manag, 2019, 5(3):265-278.DOI:10.22034/GJESM.2019.03.01. Cited in this article [1]

[20]

王黎明, 吴香华, 赵天良, 等. 基于距离相关系数和支持向量机回归的PM_2.5浓度滚动统计预报方案[J]. 环境科学学报, 2017, 37(4):1268-1276.

WANG

L M

, WU

X H

, ZHAO

T L

, et al. A scheme for rolling statistical forecasting of PM_2.5 concentrations based on distance correlation coefficient and support vector regression[J]. Acta Sci Circumstantiae, 2017, 37(4):1268-1276.DOI:10.13671/j.hjkxxb.2016.0345.

Cited in this article [1]

[21]	JUHOS I, MAKRA L, TÓTH B. Forecasting of traffic origin NO and NO₂ concentrations by support vector machines and neural networks using principal component analysis[J]. Simul Model Pract Theory, 2008, 16(9):1488-1502.DOI:10.1016/j.simpat.2008.08.006. Cited in this article [1]

[22]	王占山, 李云婷, 陈添, 等. 2013年北京市PM_2.5的时空分布[J]. 地理学报, 2015, 70(1):110-120. WANG Z S, LI Y T, CHEN T, et al. Spatial-temporal characteristics of PM_2.5 in Beijing in 2013[J]. Acta Geogr Sin, 2015, 70(1):110-120.DOI:10.11821/dlxb201501009. Cited in this article [2]

[23]	郭立力, 赵春江. 十折交叉检验的支持向量机参数优化算法[J]. 计算机工程与应用, 2009, 45(8):55-57. GUO L L, ZHAO C J. Optimizing parameters of support vector machine's model based on genetic algorithm[J]. Comput Eng Appl, 2009, 45(8):55-57.DOI:10.3778/j.issn.1002-8331.2009.08.017. Cited in this article [1]

[24]	CHEN T, TONG H, BENESTY M. Xgboost: eXtreme gradient boosting[M]. London: Sage Publications, 2016:931-961. Cited in this article [1]

[25]	方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3):32-38. FANG K N, WU J B, ZHU J P, et al. A review of technologies on random forests[J]. Stat Inf Forum, 2011, 26(3):32-38.DOI:10.3969/j.issn.1007-3116.2011.03.006. Cited in this article [1]

[26]	KE G, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceeding of the 13th interenational Conference Neural Information Processing Systems, New York: ACM, 2017:3149-3157. Cited in this article [1]

[27]	桑应宾. 基于K近邻的分类算法研究[D]. 重庆: 重庆大学, 2009. SANG Y B. Research of classification algorithm based on K nearest neighbor[D]. Chongqing: Chongqing University, 2009. Cited in this article [1]

[28]	CARRERA-GARCÍA L, MUCHART J, LAZARO J J, et al. Pediatric SMA patients with complex spinal anatomy:implementation and evaluation of a decision-tree algorithm for administration of nusinersen[J]. Eur J Paediatr Neurol, 2021, 31:92-101.DOI:10.1016/j.ejpn.2021.02.009. Cited in this article [1]

[29]	LI X, PENG L, YAO X J, et al. Long short-term memory neural network for air pollutant concentration predictions:method development and evaluation[J]. Environ Pollut, 2017, 231(Pt 1):997-1004.DOI:10.1016/j.envpol.2017.08.114. Cited in this article [1]

[30]	PENG H C, LONG F H, DING C. Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min-redundancy[J]. IEEE Trans Pattern Anal Mach Intell, 2005, 27(8):1226-1238.DOI:10.1109/TPAMI.2005.159. Cited in this article [1]

[31]	BAE JE, CHOI H, SHIN D W, et al. Fine particulate matter (PM_2.5) inhibits ciliogenesis by increasing SPRR3 expression via c-Jun activation in RPE cells and skin keratinocytes[J]. Sci Rep, 2019, 9(1):3994.DOI:10.1038/s41598-019-40670-y. Cited in this article [1]

[32]	环境保护部. HJ 633-2012:环境空气质量指数(AQI)技术规定(试行)[EB/OL]. [2021-05-10]. http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/jcffbz/201203/W020120410332725219541.pdf. http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/jcffbz/201203/W020120410332725219541.pdf Cited in this article [1]

[33]	MA X Y, JIA H L, SHA T, et al. Spatial and seasonal characteristics of particulate matter and gaseous pollution in China: implications for control policy[J]. Environ Pollut, 2019, 248:421-428.DOI:10.1016/j.envpol.2019.02.038. Cited in this article [1]

[34]

曾昭亮, 郭建平, 马大喜. 基于江西地区多卫星数据的气溶胶立体分布研究[J]. 大气与环境光学学报, 2016, 11(5):391-400.

ZENG

Z L

, GUO

J P

, MA

D X

. Research of aerosol three-dimensional distribution based on multi-satellite data over Jiangxi[J]. J Atmos Environ Opt, 2016, 11(5):391-400.DOI:10.3969/j.issn.1673-6141.2016.05.007.

Cited in this article [1]

PDF(2819 KB)

Accesses

Citation

Detail

Sections

Recommended

The full text is translated into English by AI, aiming to facilitate reading and comprehension. The core content is subject to the explanation in Chinese.

www.nldxb.njfu.edu.cn

Edited ＆ Published by Editorial Department of Nanjing Forestry University(Natural Sciences Edition)
Address： No.159 Longpan Road,Nanjing 210037 Jiangsu,P.R.China
E-mail ：xuebao@njfu.edu.cn , xuebao@njfu.com.cn
Telephone +86-25-85428247,85427076
Distributed by China International Book Trading Corporation P.O. Box 399, Beijing ,P.R. China

〈

〉

Received	Revised	Published
2021-06-19	2021-09-05	2022-09-30
Issue Date
2022-10-19

Please choose a citation manager

Content to export