JOURNAL OF NANJING FORESTRY UNIVERSITY ›› 2019, Vol. 43 ›› Issue (04): 109-116.doi: 10.3969/j.issn.1000-2006.201807059

Previous Articles     Next Articles

Research on feature selection based on single feature classification accuracy

DU Xuehui, MENG Chun*, LIU Meishuang   

  1. (College of Engineering and Technology, Northeast Forestry University, Harbin 150040, China)
  • Online:2019-07-22 Published:2019-07-22

Abstract: 【Objective】  In this study, random forest(RF)and support vector machine(SVM)classifiers were used to explore a method for guaranteeing classification accuracy and reducing feature dimensions of remote sensing classification. 【Method】Using the Fuxing Forest Farm in Antu County of Jilin Province as the research area and Landsat-8 image in 2015 as a data source, the spectral(red, green, blue, near-infrared and shortwave infrared bands), vegetation index(NDVI, enhanced vegetation index ratio vegetation index, and bare soil vegetation index), texture(homogeneity, mean, second moment, variance, difference, contrast, entropy and correlation), and topographic information(slope and aspect)were determined for a total of 19 indicators as classification features. Feature selection was based on estimations of feature importance in RF as a contrast according to the classification accuracy of a signal feature in RF and SVM classifiers to select features, and the selected features were divided into two cases based on whether principal component analysis was performed. Next, the RF and SVM classifiers were used for classification. Finally, the classification accuracy was evaluated, and the optimal feature and classifier combination was determined.【Result】The method, which was based on the SVM single feature classification accuracy was used to select features, and the selected features were analyzed by principle components analysis. RF was used for classification, which was better than other classification properties. The feature dimension was 5, overall accuracy was 0.86, and Kappa coefficient was 0.83. By comparing the classification of all features, the classification accuracy was improved and dimensions decreased, increasing the rate of classification. The RF classification of features selected based on the feature importance of RF achieved high classification accuracy. However, when the feature dimensions were less than 7, the classification accuracy fluctuated greatly, reaching a maximum value of 0.88 when the feature dimension was 4, followed by an immediate decrease to 0.83, after which this value was maintained. The classification accuracy of features selected based on a single feature classification accuracy changed more slowly, as in the method described as the best classification combination above, with accuracy fluctuation showing a range of approximately 0.02.  Classification of features selected based on the classification accuracy of a single feature did not affect the RF and SVM classifiers, in the subsequent classification process, the accuracy of the SVM classifier was higher than that of the RF. SVM classification of features selected based on RF single feature classification accuracy and RF classification of features selected based on SVM single feature classification accuracy and the selected features were analyzed by principle components analysis. The results were compared with those obtained using the SVM or RF single classifier to select features and for classification; the former showed higher accuracy.【Conclusion】 The feature selection method based on single feature classification accuracy can guarantee classification accuracy and reduce feature dimensions. Classification of features selected based on this method was more stable than that selected based on the estimation of the feature importance of RF.  Features selected based on the classification accuracy of a single feature in different classifiers as well as the final classification accuracy differed. The classification performance of different classifiers was better than that of a single classifier for selecting features and classification.  In the middle and low dimensions, the classification accuracy of the RF classifier may be related to the feature input order, and principal component analysis to input features may be beneficial for improving the classification accuracy and stability of RF.

CLC Number: