南京林业大学学报(自然科学版) ›› 2019, Vol. 43 ›› Issue (04): 109-116.doi: 10.3969/j.issn.1000-2006.201807059

• 研究论文 • 上一篇    下一篇

基于单个特征分类准确率的特征选择方法研究

杜学惠,孟 春*,刘美爽   

  1. (东北林业大学工程技术学院,黑龙江 哈尔滨 150040)
  • 出版日期:2019-07-22 发布日期:2019-07-22
  • 基金资助:
    收稿日期:2018-07-29 修回日期:2018-12-17 基金项目:国家自然科学基金面上项目(31570547)。 第一作者:杜学惠(305071681@qq.com),ORCID(0000-0003-0244-6420)。*通信作者:孟春(504973901@qq.com),副教授,ORCID(0000-0003-2270-8782)。

Research on feature selection based on single feature classification accuracy

DU Xuehui, MENG Chun*, LIU Meishuang   

  1. (College of Engineering and Technology, Northeast Forestry University, Harbin 150040, China)
  • Online:2019-07-22 Published:2019-07-22

摘要: 【目的】随着遥感技术迅猛发展,在影像解译过程中提取的信息越来越繁杂多样。为提高地物分类准确率,常加入更多的特征信息,而由此往往造成一定的信息冗余,导致分类效率甚至准确率降低。笔者利用随机森林(RF)和支持向量机(SVM)分类器,探索在遥感分类过程中保证分类精度的同时又能降低特征维度的方法。【方法】以吉林省安图县福兴林场部分区域为研究对象,利用2015年Landsat-8影像为数据源,提取光谱信息(红、绿、蓝、近红外和短波红外波段)、植被指数(NDVI、增强型植被指数、比值植被指数和裸土植被指数)、纹理(同质性、均值、二阶矩、方差、差异性、对比度、熵和相关性)和地形信息(坡度和坡向)共19个指标作为分类特征变量。以RF分类器估测的特征重要性进行特征选择为对照,分别以单个特征在RF和SVM两分类器中的分类准确率为依据进行特征选择,并对选取的特征进行主成分分析,与未做主成分分析的进行区分,再分别用RF和SVM分类器进行分类,评价分类精度,确定最优特征和分类器组合。【结果】①基于SVM单个特征分类准确率选取特征,对选取的特征进行主成分分析,再用RF进行分类,该方法与其他方法相比分类性能最好,当特征维度为5时,总体精度为0.86,Kappa系数为0.83; 与输入全部特征进行分类相比,不仅提高了分类精度,而且降低了特征维度,使分类效率得以提升。基于RF特征重要性选取特征的RF分类取得了较高的分类准确率,但特征维数小于7时,分类准确率波动较大; 在特征维数为4时分类准确率增至最大值(0.88),随后骤降为0.83,之后基本保持在此水平。而基于单个特征分类准确率选取特征,分类准确率变化较为平缓,如上所提最优分类性能方法的分类准确率波动范围基本在0.02。②基于单个特征在RF和SVM分类器中的分类准确率进行特征选择,在随后的分类过程中,SVM分类器分类精度总体高于RF。基于RF单个特征分类准确率选取特征的SVM分类,及基于SVM单个特征分类准确率选取特征并对选取特征进行主成分分析的RF分类,较仅利用SVM或RF单个分类器选取特征并分类的分类准确率更高。【结论】①基于单个特征分类准确率的特征选择方法,可在保证分类精度的同时降低特征维度,且在较低维度时,基于该方法选取特征的分类精度较基于特征重要性选取特征的分类精度更稳定。②基于单个特征分类准确率进行特征选择,不同分类器选取的特征有所差异,分类准确率也不同,利用多个分类器较单个分类器选取特征并分类的性能更好。③在中低维度时,RF分类器的分类准确率可能与特征输入顺序有关,对输入特征进行主成分分析有利于提高分类器的分类精度及稳定性。

Abstract: 【Objective】  In this study, random forest(RF)and support vector machine(SVM)classifiers were used to explore a method for guaranteeing classification accuracy and reducing feature dimensions of remote sensing classification. 【Method】Using the Fuxing Forest Farm in Antu County of Jilin Province as the research area and Landsat-8 image in 2015 as a data source, the spectral(red, green, blue, near-infrared and shortwave infrared bands), vegetation index(NDVI, enhanced vegetation index ratio vegetation index, and bare soil vegetation index), texture(homogeneity, mean, second moment, variance, difference, contrast, entropy and correlation), and topographic information(slope and aspect)were determined for a total of 19 indicators as classification features. Feature selection was based on estimations of feature importance in RF as a contrast according to the classification accuracy of a signal feature in RF and SVM classifiers to select features, and the selected features were divided into two cases based on whether principal component analysis was performed. Next, the RF and SVM classifiers were used for classification. Finally, the classification accuracy was evaluated, and the optimal feature and classifier combination was determined.【Result】The method, which was based on the SVM single feature classification accuracy was used to select features, and the selected features were analyzed by principle components analysis. RF was used for classification, which was better than other classification properties. The feature dimension was 5, overall accuracy was 0.86, and Kappa coefficient was 0.83. By comparing the classification of all features, the classification accuracy was improved and dimensions decreased, increasing the rate of classification. The RF classification of features selected based on the feature importance of RF achieved high classification accuracy. However, when the feature dimensions were less than 7, the classification accuracy fluctuated greatly, reaching a maximum value of 0.88 when the feature dimension was 4, followed by an immediate decrease to 0.83, after which this value was maintained. The classification accuracy of features selected based on a single feature classification accuracy changed more slowly, as in the method described as the best classification combination above, with accuracy fluctuation showing a range of approximately 0.02.  Classification of features selected based on the classification accuracy of a single feature did not affect the RF and SVM classifiers, in the subsequent classification process, the accuracy of the SVM classifier was higher than that of the RF. SVM classification of features selected based on RF single feature classification accuracy and RF classification of features selected based on SVM single feature classification accuracy and the selected features were analyzed by principle components analysis. The results were compared with those obtained using the SVM or RF single classifier to select features and for classification; the former showed higher accuracy.【Conclusion】 The feature selection method based on single feature classification accuracy can guarantee classification accuracy and reduce feature dimensions. Classification of features selected based on this method was more stable than that selected based on the estimation of the feature importance of RF.  Features selected based on the classification accuracy of a single feature in different classifiers as well as the final classification accuracy differed. The classification performance of different classifiers was better than that of a single classifier for selecting features and classification.  In the middle and low dimensions, the classification accuracy of the RF classifier may be related to the feature input order, and principal component analysis to input features may be beneficial for improving the classification accuracy and stability of RF.

中图分类号: