JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2022, Vol. 46 ›› Issue (5): 152-160.doi: 10.12302/j.issn.1000-2006.202106023
Previous Articles Next Articles
CHEN Jiankun(), MU Fengyun(), ZHANG Yongchuan, TIAN Tian, WANG Junxiu
Received:
2021-06-19
Revised:
2021-09-05
Online:
2022-09-30
Published:
2022-10-19
Contact:
MU Fengyun
E-mail:1561979759@qq.com;mfysd@cqjtu.edu.cn
CLC Number:
CHEN Jiankun, MU Fengyun, ZHANG Yongchuan, TIAN Tian, WANG Junxiu. Comparative analysis of hourly PM2.5 prediction based on multiple machine learning models[J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2022, 46(5): 152-160.
Table 1
Site monitoring data hour by hour"
数据时间 time | 质量浓度/(μg·m-3) content | 气温/℃ air temperature | 湿度/% humidity | 风速/ (m·s-1) wind speed | 风向/(°) wind direction | 气压/kPa air pressure | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
PM2.5 | PM10 | CO | SO2 | NO2 | O3 | ||||||
2020-01-23 00:00:00 | 86 | 116 | 1.256 | 20 | 34 | 26 | 12.57 | 77.83 | 0.19 | 4 | 99.360 98 |
2020-01-23 01:00:00 | 88 | 119 | 1.350 | 22 | 48 | 10 | 12.67 | 79.87 | 0.15 | 11 | 99.364 36 |
2020-01-23 02:00:00 | 87 | 118 | 1.292 | 20 | 39 | 10 | 12.24 | 81.78 | 0.22 | 40 | 99.356 69 |
2020-01-23 03:00:00 | 86 | 116 | 1.230 | 19 | 35 | 13 | 12.12 | 81.43 | 0.16 | 36 | 99.355 83 |
2020-01-23 04:00:00 | 79 | 109 | 1.194 | 18 | 26 | 23 | 11.85 | 83.48 | 0.18 | 30 | 99.311 74 |
2020-01-23 05:00:00 | 92 | 115 | 1.194 | 19 | 28 | 18 | 11.79 | 83.86 | 0.18 | 67 | 99.277 15 |
2020-01-23 06:00:00 | 81 | 108 | 1.166 | 19 | 27 | 16 | 11.64 | 85.29 | 0.16 | 48 | 99.220 48 |
2020-01-23 07:00:00 | 82 | 109 | 1.200 | 18 | 32 | 11 | 11.58 | 85.88 | 0.23 | 31 | 99.237 00 |
2020-01-23 08:00:00 | 79 | 102 | 1.202 | 19 | 35 | 5 | 11.59 | 85.55 | 0.20 | 50 | 99.308 14 |
2020-01-23 09:00:00 | 80 | 98 | 1.248 | 17 | 30 | 9 | 11.60 | 86.42 | 0.20 | 83 | 99.375 48 |
2020-01-23 10:00:00 | 88 | 105 | 1.253 | 17 | 28 | 11 | 11.79 | 86.28 | 0.20 | 112 | 99.445 52 |
2020-01-23 11:00:00 | 95 | 121 | 1.375 | 16 | 33 | 12 | 12.48 | 84.48 | 0.27 | 158 | 99.490 83 |
2020-01-23 12:00:00 | 97 | 123 | 1.385 | 15 | 39 | 10 | 12.71 | 83.67 | 0.26 | 159 | 99.495 35 |
2020-01-23 13:00:00 | 99 | 125 | 1.330 | 15 | 32 | 19 | 13.29 | 81.55 | 0.43 | 136 | 99.416 11 |
2020-01-23 14:00:00 | 109 | 142 | 1.399 | 14 | 40 | 18 | 14.24 | 76.73 | 0.27 | 286 | 99.329 84 |
2020-01-23 15:00:00 | 101 | 135 | 1.332 | 14 | 29 | 38 | 14.78 | 72.54 | 0.34 | 296 | 99.249 16 |
2020-01-23 16:00:00 | 83 | 120 | 1.245 | 12 | 25 | 60 | 13.74 | 75.04 | 0.43 | 292 | 99.227 36 |
2020-01-23 17:00:00 | 92 | 133 | 1.272 | 14 | 27 | 58 | 13.57 | 76.78 | 0.32 | 295 | 99.229 41 |
2020-01-23 18:00:00 | 58 | 90 | 1.232 | 11 | 26 | 61 | 13.32 | 75.50 | 0.27 | 298 | 99.303 92 |
2020-01-23 19:00:00 | 89 | 132 | 1.299 | 13 | 35 | 50 | 12.58 | 80.99 | 0.39 | 256 | 99.384 96 |
2020-01-23 20:00:00 | 107 | 139 | 1.358 | 15 | 41 | 34 | 11.89 | 84.82 | 0.25 | 333 | 99.472 48 |
2020-01-23 21:00:00 | 113 | 142 | 1.417 | 16 | 50 | 16 | 11.32 | 87.04 | 0.24 | 11 | 99.582 50 |
2020-01-23 22:00:00 | 114 | 144 | 1.330 | 17 | 43 | 14 | 10.98 | 89.70 | 0.18 | 18 | 99.644 05 |
2020-01-23 23:00:00 | 113 | 140 | 1.293 | 16 | 36 | 11 | 10.80 | 90.43 | 0.14 | 13 | 99.655 27 |
Table 2
The prediction model characteristic and parameter determination"
模型 model | 模型特点 model characteristic | 参数确定 parameter determination |
---|---|---|
XGBoost | 基于决策树的集成机器学习算法,以梯度提升(gradient boost)为框架,由多个弱分类器集成而构建强分类器[ | n_estimators=300,ooster="dart", max_depth=9,learning_rate=0.1,reg_lambda=0.2 |
随机森林 RF | 基于分类树算法,通过对大量分类树的汇总提高预测精度,对于异常值和噪声具有很好的容忍度,且预测时不易出现过拟合,是一种非线性建模的工具[ | n_estimators=600,oob_score=True |
LightGBM | 一种梯度提升决策树框架,具有训练速度快、效率高、内存占用低、准确性高、支持并行和GPU学习、能够处理大规模数据等优点[ | max_depth=9,num_leaves=30, ubsample=0.5,learning_rate=0.1, min_data_in_leaf=21 |
近邻算法 KNN | 近邻算法是将数据集合中每一个记录进行分类的方法。数据预测时KNN对于给定的输入X,在历史输入中搜索找出距离最近的K个特征值,然后对K个特征值进行加权估计即可得到预测值[ | K=11,weights = ‘distance’ |
模型 model | 模型特点 model characteristic | 参数确定 parameter determination |
决策树 DT | 用于分类和回归任务的机器学习算法,可选择一个或多个变量作为输入变量,仅有单一输出,具有训练时间快、内存占用小的特点[ | max_depth: 19, min_samples_leaf: 12, min_samples_split: 6 |
长短期记忆神经 网络LSTM | 特殊的RNNs,网络结构中重复的单元不用,重复的单元被称为memory block(记忆块)。主要包含了3个门(forget gate、input gate、output gate)与1个记忆单元(cell)。网络中的cell state(单元状态)可以控制信息传递给下一时刻。在解决非线性时间序列数据时优势明显[ | output_dim=60,activation=‘relu’, epochs=30, batch_size = 72,loss=‘mae’, optimizer=‘adam’ |
Table 4
Prediction and comparison of PM2.5 pollution levels in Hechuan District by six models"
等级 grade | 实测值 measured value | XGBoost | RF | LightGBM | DT | KNN | LSTM |
---|---|---|---|---|---|---|---|
1 | 6 285 | 6 224 | 6 248 | 6 220 | 6 230 | 6 343 | 6 099 |
2 | 4 260 | 4 275 | 4 281 | 4 301 | 4 245 | 4 221 | 4 333 |
3 | 1 451 | 1 548 | 1 502 | 1 525 | 1 564 | 1 496 | 1 585 |
4 | 401 | 378 | 390 | 374 | 375 | 381 | 391 |
5 | 146 | 121 | 123 | 122 | 130 | 114 | 136 |
6 | 13 | 10 | 12 | 14 | 12 | 11 | 12 |
Table 5
Model prediction results for different seasons"
模型 model | 春季spring | 夏季summer | 秋季autumn | 冬季winter | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R2 | σRMSE | σMAE | σMAPE/ % | R2 | σRMSE | σMAE | σMAPE/ % | R2 | σRMSE | σMAE | σMAPE/ % | R2 | σRMSE | EMAE | σMAPE/ % | ||
XGBoost | 0.905 | 9.127 | 5.479 | 14.852 | 0.900 | 6.364 | 4.107 | 21.752 | 0.922 | 8.367 | 4.680 | 16.131 | 0.948 | 8.085 | 5.720 | 9.697 | |
RF | 0.901 | 9.298 | 5.573 | 15.247 | 0.906 | 6.174 | 4.066 | 21.736 | 0.915 | 8.718 | 4.823 | 16.636 | 0.946 | 8.267 | 5.952 | 10.041 | |
LightGBM | 0.903 | 9.214 | 5.754 | 16.440 | 0.902 | 6.295 | 4.156 | 22.912 | 0.915 | 8.717 | 4.952 | 17.790 | 0.942 | 8.567 | 6.096 | 10.339 | |
DT | 0.861 | 11.019 | 6.453 | 17.430 | 0.876 | 7.098 | 4.681 | 24.701 | 0.895 | 9.687 | 5.451 | 18.936 | 0.927 | 9.585 | 6.848 | 11.441 | |
KNN | 0.890 | 9.805 | 6.157 | 17.669 | 0.889 | 6.693 | 4.472 | 25.308 | 0.905 | 9.210 | 5.322 | 19.870 | 0.939 | 8.753 | 6.311 | 11.238 | |
LSTM | 0.889 | 9.845 | 6.028 | 17.639 | 0.885 | 6.822 | 4.455 | 24.485 | 0.932 | 7.816 | 5.033 | 18.477 | 0.937 | 8.877 | 6.342 | 10.949 |
[1] | FUZZI S, BALTENSPERGER U, CARSLAW K, et al. Particulate matter,air quality and climate:lessons learned and future needs[J]. Atmos Chem Phys, 2015, 15(14):8217-8299.DOI:10.5194/acp-15-8217-2015. |
[2] | CESARI D, DE BENEDETTO G E, BONASONI P, et al. Seasonal variability of PM2.5 and PM10 composition and sources in an urban background site in southern Italy[J]. Sci Total Environ, 2018, 612:202-213.DOI:10.1016/j.scitotenv.2017.08.230. |
[3] | MANISALIDIS I, STAVROPOULOU E, STAVROPOULOS A, et al. Environmental and health impacts of air pollution:a review[J]. Front Public Health, 2020, 8:14.DOI:10.3389/fpubh.2020.00014. |
[4] | KIM K H, KABIR E, KABIR S. A review on the human health impact of airborne particulate matter[J]. Environ Int, 2015, 74:136-143.DOI:10.1016/j.envint.2014.10.005. |
[5] | LIU H Y, DUNEA D, IORDACHE S, et al. A review of airborne particulate matter effects on young children's respiratory symptoms and diseases[J]. Atmosphere, 2018, 9(4):150.DOI:10.3390/atmos9040150. |
[6] | CHOI S, KIM K H, KIM K, et al. Association between post-diagnosis particulate matter exposure among 5-year cancer survivors and cardiovascular disease risk in three metropolitan areas from south Korea[J]. Int J Environ Res Public Health, 2020, 17(8):2841.DOI:10.3390/ijerph17082841. |
[7] | ATKINSON R W, KANG S, ANDERSON H R, et al. Epidemiological time series studies of PM2.5 and daily mortality and hospital admissions:a systematic review and meta-analysis[J]. Thorax, 2014, 69(7):660-665.DOI:10.1136/thoraxjnl-2013-204492. |
[8] | ZHANG Y, BOCQUET M, MALLET V, et al. Real-time air quality forecasting,part I: history,techniques,and current status[J]. Atmos Environ, 2012, 60:632-655.DOI:10.1016/j.atmosenv.2012.06.031. |
[9] | 李锋, 朱彬, 安俊岭, 等. 2013年12月初长江三角洲及周边地区重霾污染的数值模拟[J]. 中国环境科学, 2015, 35(7):1965-1974. |
LI F, ZHU B, AN J L, et al. Modeling study of a severe haze episode occurred over the Yangtze River Delta and its surrounding regions during early December,2013[J]. China Environ Sci, 2015, 35(7):1965-1974.DOI:10.3969/j.issn.1000-6923.2015.07.008. | |
[10] | 周广强, 谢英, 吴剑斌, 等. 基于WRF-Chem模式的华东区域PM2.5预报及偏差原因[J]. 中国环境科学, 2016, 36(8):2251-2259. |
ZHOU G Q, XIE Y, WU J B, et al. WRF-Chem based PM2.5 forecast and bias analysis over the East China region[J]. China Environ Sci, 2016, 36(8):2251-2259.DOI:10.3969/j.issn.1000-6923.2016.08.002. | |
[11] | DENNIS R L, BYUN D W, NOVAK J H, et al. The next generation of integrated air quality modeling:EPA's models-3[J]. Atmos Environ, 1996, 30(12):1925-1938.DOI:10.1016/1352-2310(95)00174-3. |
[12] | CHEN Q Q, TAYLOR D. Transboundary atmospheric pollution in southeast Asia:current methods,limitations and future developments[J]. Crit Rev Environ Sci Technol, 2018, 48(16/17/18):997-1029.DOI:10.1080/10643389.2018.1493337. |
[13] | SHIMADERA H, KOJIMA T, KONDO A. Evaluation of air quality model performance for simulating long-range transport and local pollution of PM2.5 in Japan[J]. Adv Meteorol, 2016, 2016:5694251.DOI:10.1155/2016/5694251. |
[14] | 郑毅, 朱成璋. 基于深度信念网络的PM2.5预测[J]. 山东大学学报(工学版), 2014, 44(6):19-25. |
ZHENG Y, ZHU C Z. A prediction method of atmospheric PM2.5 based on DBNs[J]. J Shandong Univ (Eng Sci), 2014, 44(6):19-25.DOI:10.6040/j.issn.1672-3961.1.2014.180. | |
[15] | 曲悦, 钱旭, 宋洪庆, 等. 基于机器学习的北京市PM2.5浓度预测模型及模拟分析[J]. 工程科学学报, 2019, 41(3):401-407. |
QU Y, QIAN X, SONG H Q, et al. Machine-learning-based model and simulation analysis of PM2.5 concentration prediction in Beijing[J]. Chin J Eng, 2019, 41(3):401-407.DOI:10.13374/j.issn2095-9389.2019.03.014. | |
[16] | 李建新, 刘小生, 刘静, 等. 基于MRMR-HK-SVM模型的PM2.5浓度预测[J]. 中国环境科学, 2019, 39(6):2304-2310. |
LI J X, LIU X S, LIU J, et al. Prediction of PM2.5 concentration based on MRMR-HK-SVM model[J]. China Environ Sci, 2019, 39(6):2304-2310. DOI:10.3969/j.issn.1000-6923.2019.06.009. | |
[17] | 宋国君, 国潇丹, 杨啸, 等. 沈阳市PM2.5浓度ARIMA-SVM组合预测研究[J]. 中国环境科学, 2018, 38(11):4031-4039. |
SONG G J, GUO X D, YANG X, et al. ARIMA-SVM combination prediction of PM2.5 concentration in Shenyang[J]. China Environ Sci, 2018, 38(11):4031-4039.DOI:10.3969/j.issn.1000-6923.2018.11.005. | |
[18] | 康俊锋, 黄烈星, 张春艳, 等. 多机器学习模型下逐小时PM2.5预测及对比分析[J]. 中国环境科学, 2020, 40(5):1895-1905. |
KANG J F, HUANG L X, ZHANG C Y, et al. Hourly PM2.5 prediction and its comparative analysis under multi-machine learning model[J]. China Environ Sci, 2020, 40(5):1895-1905.DOI:10.19674/j.cnki.issn1000-6923.2020.0213. | |
[19] | KAYES I, SHAHRIAR S A, HASAN K, et al. The relationships between meteorological parameters and air pollutants in an urban environment[J]. Glob J Environ Sci Manag, 2019, 5(3):265-278.DOI:10.22034/GJESM.2019.03.01. |
[20] | 王黎明, 吴香华, 赵天良, 等. 基于距离相关系数和支持向量机回归的PM2.5浓度滚动统计预报方案[J]. 环境科学学报, 2017, 37(4):1268-1276. |
WANG L M, WU X H, ZHAO T L, et al. A scheme for rolling statistical forecasting of PM2.5 concentrations based on distance correlation coefficient and support vector regression[J]. Acta Sci Circumstantiae, 2017, 37(4):1268-1276.DOI:10.13671/j.hjkxxb.2016.0345. | |
[21] | JUHOS I, MAKRA L, TÓTH B. Forecasting of traffic origin NO and NO2 concentrations by support vector machines and neural networks using principal component analysis[J]. Simul Model Pract Theory, 2008, 16(9):1488-1502.DOI:10.1016/j.simpat.2008.08.006. |
[22] | 王占山, 李云婷, 陈添, 等. 2013年北京市PM2.5的时空分布[J]. 地理学报, 2015, 70(1):110-120. |
WANG Z S, LI Y T, CHEN T, et al. Spatial-temporal characteristics of PM2.5 in Beijing in 2013[J]. Acta Geogr Sin, 2015, 70(1):110-120.DOI:10.11821/dlxb201501009. | |
[23] | 郭立力, 赵春江. 十折交叉检验的支持向量机参数优化算法[J]. 计算机工程与应用, 2009, 45(8):55-57. |
GUO L L, ZHAO C J. Optimizing parameters of support vector machine's model based on genetic algorithm[J]. Comput Eng Appl, 2009, 45(8):55-57.DOI:10.3778/j.issn.1002-8331.2009.08.017. | |
[24] | CHEN T, TONG H, BENESTY M. Xgboost: eXtreme gradient boosting[M]. London: Sage Publications, 2016:931-961. |
[25] | 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3):32-38. |
FANG K N, WU J B, ZHU J P, et al. A review of technologies on random forests[J]. Stat Inf Forum, 2011, 26(3):32-38.DOI:10.3969/j.issn.1007-3116.2011.03.006. | |
[26] | KE G, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceeding of the 13th interenational Conference Neural Information Processing Systems, New York: ACM, 2017:3149-3157. |
[27] | 桑应宾. 基于K近邻的分类算法研究[D]. 重庆: 重庆大学, 2009. |
SANG Y B. Research of classification algorithm based on K nearest neighbor[D]. Chongqing: Chongqing University, 2009. | |
[28] | CARRERA-GARCÍA L, MUCHART J, LAZARO J J, et al. Pediatric SMA patients with complex spinal anatomy:implementation and evaluation of a decision-tree algorithm for administration of nusinersen[J]. Eur J Paediatr Neurol, 2021, 31:92-101.DOI:10.1016/j.ejpn.2021.02.009. |
[29] | LI X, PENG L, YAO X J, et al. Long short-term memory neural network for air pollutant concentration predictions:method development and evaluation[J]. Environ Pollut, 2017, 231(Pt 1):997-1004.DOI:10.1016/j.envpol.2017.08.114. |
[30] | PENG H C, LONG F H, DING C. Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min-redundancy[J]. IEEE Trans Pattern Anal Mach Intell, 2005, 27(8):1226-1238.DOI:10.1109/TPAMI.2005.159. |
[31] | BAE JE, CHOI H, SHIN D W, et al. Fine particulate matter (PM2.5) inhibits ciliogenesis by increasing SPRR3 expression via c-Jun activation in RPE cells and skin keratinocytes[J]. Sci Rep, 2019, 9(1):3994.DOI:10.1038/s41598-019-40670-y. |
[32] | 环境保护部. HJ 633-2012:环境空气质量指数(AQI)技术规定(试行)[EB/OL]. [2021-05-10]. http://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/jcffbz/201203/W020120410332725219541.pdf. |
[33] | MA X Y, JIA H L, SHA T, et al. Spatial and seasonal characteristics of particulate matter and gaseous pollution in China: implications for control policy[J]. Environ Pollut, 2019, 248:421-428.DOI:10.1016/j.envpol.2019.02.038. |
[34] | 曾昭亮, 郭建平, 马大喜. 基于江西地区多卫星数据的气溶胶立体分布研究[J]. 大气与环境光学学报, 2016, 11(5):391-400. |
ZENG Z L, GUO J P, MA D X. Research of aerosol three-dimensional distribution based on multi-satellite data over Jiangxi[J]. J Atmos Environ Opt, 2016, 11(5):391-400.DOI:10.3969/j.issn.1673-6141.2016.05.007. |
[1] | ZHAO Lingxiao, LI Zhiyang, QU Leilei. Improved time series models based on EMD and CatBoost algorithms: taking PM2.5 prediction of Dalian City as an example [J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2024, 48(3): 268-274. |
[2] | LI Shixin, ZHANG Fuquan, LIN Haifeng. Research on forest fire risk evaluation based on machine learning algorithm [J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2023, 47(5): 49-56. |
[3] | WANG Yunni, CAO Gongxiang, XU Lihong, CHEN Shengnan. Evapotranspiration characteristics of Larix principis-rupprechtii plantation and its impact factors in the Daqing Mountains of Inner Mongolia [J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2023, 47(4): 148-156. |
[4] | HOU Xiujuan, YAN Xiaoyun, WANG Bo, LI Xinyuan, BAO Hongguang. Variation characteristics of the air anion and air particulate matter in arid and semi-arid urban park green spaces during summer [J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2022, 46(4): 212-220. |
[5] | GUO Tianwei, LU Chunfeng, WANG Junxiao, LIU Ruicheng, ZHOU Shenglu. Construction and optimization of ecological security pattern based on the coupling of ecological-production-living spaces: taking Yangzhou City as an example [J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2021, 45(5): 133-142. |
[6] | HUANG Yaru, XIN Zhiming, LI Yonghua, MA Yingbin, DONG Xue, LUO Fengmin, LI Xinle, DUAN Ruibin. Seasonal variation of the stem sap flow of artificial Haloxylon ammodendron (C.A.Mey.) Bunge and its relationship with meteorological factors in Ulan Buh Desert [J]. JOURNAL OF NANJING FORESTRY UNIVERSITY, 2020, 44(6): 131-139. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||