[1]杜学惠,孟 春*,刘美爽.基于单个特征分类准确率的特征选择方法研究[J].南京林业大学学报(自然科学版),2019,43(04):109-116.[doi:10. 3969/ j. issn. 1000-2006. 201807059]
 DU Xuehui,MENG Chun*,LIU Meishuang.Research on feature selection based on single feature classification accuracy[J].Journal of Nanjing Forestry University(Natural Science Edition),2019,43(04):109-116.[doi:10. 3969/ j. issn. 1000-2006. 201807059]
点击复制

基于单个特征分类准确率的特征选择方法研究
分享到:

《南京林业大学学报(自然科学版)》[ISSN:1000-2006/CN:32-1161/S]

卷:
43
期数:
2019年04期
页码:
109-116
栏目:
研究论文
出版日期:
2019-07-24

文章信息/Info

Title:
Research on feature selection based on single feature classification accuracy
文章编号:
1000-2006(2019)04-0109-08
作者:
杜学惠孟 春*刘美爽
(东北林业大学工程技术学院,黑龙江 哈尔滨 150040)
Author(s):
DU Xuehui MENG Chun* LIU Meishuang
(College of Engineering and Technology, Northeast Forestry University, Harbin 150040, China)
关键词:
特征选择 单个特征分类准确率 Landsat-8卫星 随机森林(RF) 支持向量机(SVM) 遥感分类
Keywords:
feature selection single feature classification accuracy Landsat-8 satellite random forest(RF) support vector machine(SVM) romote sensing classification
分类号:
S757.2
DOI:
10. 3969/ j. issn. 1000-2006. 201807059
文献标志码:
A
摘要:
【目的】随着遥感技术迅猛发展,在影像解译过程中提取的信息越来越繁杂多样。为提高地物分类准确率,常加入更多的特征信息,而由此往往造成一定的信息冗余,导致分类效率甚至准确率降低。笔者利用随机森林(RF)和支持向量机(SVM)分类器,探索在遥感分类过程中保证分类精度的同时又能降低特征维度的方法。【方法】以吉林省安图县福兴林场部分区域为研究对象,利用2015年Landsat-8影像为数据源,提取光谱信息(红、绿、蓝、近红外和短波红外波段)、植被指数(NDVI、增强型植被指数、比值植被指数和裸土植被指数)、纹理(同质性、均值、二阶矩、方差、差异性、对比度、熵和相关性)和地形信息(坡度和坡向)共19个指标作为分类特征变量。以RF分类器估测的特征重要性进行特征选择为对照,分别以单个特征在RF和SVM两分类器中的分类准确率为依据进行特征选择,并对选取的特征进行主成分分析,与未做主成分分析的进行区分,再分别用RF和SVM分类器进行分类,评价分类精度,确定最优特征和分类器组合。【结果】①基于SVM单个特征分类准确率选取特征,对选取的特征进行主成分分析,再用RF进行分类,该方法与其他方法相比分类性能最好,当特征维度为5时,总体精度为0.86,Kappa系数为0.83; 与输入全部特征进行分类相比,不仅提高了分类精度,而且降低了特征维度,使分类效率得以提升。基于RF特征重要性选取特征的RF分类取得了较高的分类准确率,但特征维数小于7时,分类准确率波动较大; 在特征维数为4时分类准确率增至最大值(0.88),随后骤降为0.83,之后基本保持在此水平。而基于单个特征分类准确率选取特征,分类准确率变化较为平缓,如上所提最优分类性能方法的分类准确率波动范围基本在0.02。②基于单个特征在RF和SVM分类器中的分类准确率进行特征选择,在随后的分类过程中,SVM分类器分类精度总体高于RF。基于RF单个特征分类准确率选取特征的SVM分类,及基于SVM单个特征分类准确率选取特征并对选取特征进行主成分分析的RF分类,较仅利用SVM或RF单个分类器选取特征并分类的分类准确率更高。【结论】①基于单个特征分类准确率的特征选择方法,可在保证分类精度的同时降低特征维度,且在较低维度时,基于该方法选取特征的分类精度较基于特征重要性选取特征的分类精度更稳定。②基于单个特征分类准确率进行特征选择,不同分类器选取的特征有所差异,分类准确率也不同,利用多个分类器较单个分类器选取特征并分类的性能更好。③在中低维度时,RF分类器的分类准确率可能与特征输入顺序有关,对输入特征进行主成分分析有利于提高分类器的分类精度及稳定性。
Abstract:
【Objective】  In this study, random forest(RF)and support vector machine(SVM)classifiers were used to explore a method for guaranteeing classification accuracy and reducing feature dimensions of remote sensing classification. 【Method】Using the Fuxing Forest Farm in Antu County of Jilin Province as the research area and Landsat-8 image in 2015 as a data source, the spectral(red, green, blue, near-infrared and shortwave infrared bands), vegetation index(NDVI, enhanced vegetation index ratio vegetation index, and bare soil vegetation index), texture(homogeneity, mean, second moment, variance, difference, contrast, entropy and correlation), and topographic information(slope and aspect)were determined for a total of 19 indicators as classification features. Feature selection was based on estimations of feature importance in RF as a contrast according to the classification accuracy of a signal feature in RF and SVM classifiers to select features, and the selected features were divided into two cases based on whether principal component analysis was performed. Next, the RF and SVM classifiers were used for classification. Finally, the classification accuracy was evaluated, and the optimal feature and classifier combination was determined.【Result】The method, which was based on the SVM single feature classification accuracy was used to select features, and the selected features were analyzed by principle components analysis. RF was used for classification, which was better than other classification properties. The feature dimension was 5, overall accuracy was 0.86, and Kappa coefficient was 0.83. By comparing the classification of all features, the classification accuracy was improved and dimensions decreased, increasing the rate of classification. The RF classification of features selected based on the feature importance of RF achieved high classification accuracy. However, when the feature dimensions were less than 7, the classification accuracy fluctuated greatly, reaching a maximum value of 0.88 when the feature dimension was 4, followed by an immediate decrease to 0.83, after which this value was maintained. The classification accuracy of features selected based on a single feature classification accuracy changed more slowly, as in the method described as the best classification combination above, with accuracy fluctuation showing a range of approximately 0.02.  Classification of features selected based on the classification accuracy of a single feature did not affect the RF and SVM classifiers, in the subsequent classification process, the accuracy of the SVM classifier was higher than that of the RF. SVM classification of features selected based on RF single feature classification accuracy and RF classification of features selected based on SVM single feature classification accuracy and the selected features were analyzed by principle components analysis. The results were compared with those obtained using the SVM or RF single classifier to select features and for classification; the former showed higher accuracy.【Conclusion】 The feature selection method based on single feature classification accuracy can guarantee classification accuracy and reduce feature dimensions. Classification of features selected based on this method was more stable than that selected based on the estimation of the feature importance of RF.  Features selected based on the classification accuracy of a single feature in different classifiers as well as the final classification accuracy differed. The classification performance of different classifiers was better than that of a single classifier for selecting features and classification.  In the middle and low dimensions, the classification accuracy of the RF classifier may be related to the feature input order, and principal component analysis to input features may be beneficial for improving the classification accuracy and stability of RF.

参考文献/References:

[1] LOEHLE C, IDSO C, WIGLEY T. Physiological and ecological factors influencing recent trends in United States forest health responses to climate change[J]. Forest Ecology and Management,2016, 363:179-189. DOI:10.1016/j.foreco.2015.12.042. [2] 高广磊,信忠保,丁国栋,等.基于遥感技术的森林健康研究综述[J]. 生态学报,2013,33(6):1675-1689. DOI:10.5846/stxb201112011838. GAO G L, XIN Z B, DING G D, et al. Forest health studies based on remote sensing: a review[J]. Acta Ecologica Sinica,2013, 33(6):1675-1689. [3] PAUSE M, SCHWEITZER C, ROSENTHAL M, et al. In situ/remote sensing integration to assess forest health-a review[J]. Remote Sensing, 2016, 8(6):471. DOI:10.3390/rs8060471. [4] 程希萌,沈占锋,邢廷炎,等.基于mRMR特征优选算法的多光谱遥感影像分类效率精度分析[J].地球信息科学学报,2016,18(6):815-823. DOI:10.3724/SP.J.1047.2016.00815. CHENG X M, SHEN Z F, XING T Y, et al. Efficiency and accuracy analysis of multispectral image classification based on mRMR feature selection method [J]. Journal of Geo-Information Science,2016,18(6):815-823. [5] 熊艳,高仁强,徐战亚.机载LiDAR点云数据降维与分类的随机森林方法[J].测绘学报, 2018,47(4):508-518.DOI:10.11947/j.AGCS.2018.20170417. XIONG Y, GAO R Q, XU Z Y. Random forest method for dimension reduction and point cloud classification based on airborne LiDAR [J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(4): 508-518. [6] 贾坤,李强子.农作物遥感分类特征变量选择研究现状与展望[J].资源科学,2013,35(12):2507-2516. JIA K, LI Q Z. Review of features selection in crop classification using remote sensing data[J]. Resources Science,2013,35(12):2507-2516. [7] 马玥,姜琦刚,孟治国,等.基于随机森林算法的农耕区土地利用分类研究[J].农业机械学报,2016,47(1):297-303.DOI:10.6041/j.issn.1000-1298.2016.01.040. MA Y, JIANG Q G, MENG Z G, et al. Classification of land use in farming area based on random forest algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2016, 47(1):297-303. [8] STROBL C, BOULESTEIX A L, KNEIB T, et al. Conditional variable importance for random forests[J]. BMC Bioinformatics, 2008,9:307. DOI:10.1186/1471-2105-9-307. [9] STROBL C, BOULESTEIX A L, ZEILEIS A, et al. Bias in random forest variable importance measures: illustrations, sources and a solution[J]. BMC Bioinformatics, 2007,8:25. DOI:10.1186/1471-2105-8-25. [10] STROBL C, BOULESTEIX A L, AUGUSTIN T. Unbiased split selection for classification trees based on the Gini index[J]. Computational Statistics & Data Analysis, 2007,52(1):483-501. DOI:10.1016/j.csda.2006.12.030. [11] ROY D P, WULDER M A, LOVELAND T R, et al. Landsat-8: science and product vision for terrestrial global change research[J]. Remote Sensing of Environment, 2014(145):154-172. DOI:10.1016/j.rse.2014.02.001. [12] 全国国土资源标准化技术委员会. 土地利用现状分类:GB/T 21010-2017[S].北京:中国标准出版社,2017. National Standardization Technical Committee of National Land and Resources.Current land use classification:GB/T 21010-2017[S]. Beijing: Standards Press of China,2017. [13] 国家林业局. 林业资源分类与代码 森林类型:GB/T 14721-2010[S].北京:中国标准出版社,2010. State Forestry Administration. Classification and codes for forestry resources—forest types:GB/T 14721-2010[S]. Beijing: Standards Press of China,2010. [14] 李梦颖,邢艳秋,刘美爽,等.基于支持向量机的Landsat-8影像森林类型识别研究[J].中南林业科技大学学报,2017,37(4):52-58. DOI:10.14067/j.cnki.1673-923x.2017.04.009. LI M Y, XING Y Q, LIU M S, et al. Identification of forest type with Landsat-8 image based on SVM [J]. Journal of Central South University of Forestry & Technology, 2017,37(4):52-58. [15] BREIMAN L. Random forests[J]. Machine Learning, 2001,45(1):5-32. DOI:10.1023/A:1010933404324. [16] 陈元鹏,罗明,彭军还,等.基于网格搜索随机森林算法的工矿复垦区土地利用分类[J].农业工程学报,2017,33(14):250-257. DOI:10.11975/j.issn.1002-6819.2017.14.034. CHEN Y P, LUO M, PENG J H, et al. Classification of land use in industrial and mining reclamation area based grid-search and random forest classifier[J]. Transactions of the Chinese Society for Agricultural Engineering, 2017,33(14): 250-257. [17] 刘海娟,张婷,侍昊,等.基于RF模型的高分辨率遥感影像分类评价[J].南京林业大学学报(自然科学版),2015,39(1):99-103. DOI:10.3969/j.issn.1000-2006.2015.01.018. LIU H J, ZHANG T, SHI H, et al. Classification evaluation on high resolution remote sensing image based on RF[J]. Journal of Nanjing Forestry University(Natural Sciences Edition),2015, 39(1):99-103. [18] 王奕森,夏树涛.集成学习之随机森林算法综述[J].信息通信技术,2018,12(1):49-55. WANG Y S, XIA S T. A survey of random forests algorithms[J]. Information and Communications Technologies,2018,12(1):49-55. [19] GOLDSTEIN B A, POLLEY E C, BRIGGS F B. Random forests for genetic association studies[J]. Stat Appl Genet Mol Biol, 2011, 10(1):32. DOI:10.2202/1544-6115.1691. [20] 姚登举,杨静,詹晓娟.基于随机森林的特征选择算法[J].吉林大学学报(工学版),2014,44(1):137-141. DOI:10.13229/j.cnki.jdxbgxb201401024. YAO D J, YANG J, ZHAN X J. Feature selection algorithm based on random forest[J]. Journal of Jilin University(Engineering and Technology Edition),2014,44(1):137-141. [21] 孙杰,赖祖龙.利用随机森林的城区机载LiDAR数据特征选择与分类[J].武汉大学学报(信息科学版),2014,39(11):1310-1313. DOI:10.13203/j.whugis20130206. SUN J, LAI Z L. Airborne LiDAR feature selection for urban classification using random forests [J]. Geomatics and Information Science of Wuhan University,2014, 39(11):1310-1313. [22] 黄衍,查伟雄.随机森林与支持向量机分类性能比较[J].软件,2012,33(6):107-110. DOI:10.3969/j.issn.1003-6970.2012.06.038. HUANG Y, ZHA W X. Comparison on classification performance between random forests and support vector machine[J]. Software, 2012,33(6):107-110.

备注/Memo

备注/Memo:
收稿日期:2018-07-29 修回日期:2018-12-17 基金项目:国家自然科学基金面上项目(31570547)。 第一作者:杜学惠(305071681@qq.com),ORCID(0000-0003-0244-6420)。*通信作者:孟春(504973901@qq.com),副教授,ORCID(0000-0003-2270-8782)。
更新日期/Last Update: 2019-07-22