A face recognition system aims to determine the identity of a face image with the assistance of a stored database of faces labeled with people’s identities. It is widely considered as one of the most promising biometric techniques, and has attracted much attention in the area of pattern recognition and machine learning. Numerous methods have been proposed in the last decades and they are briefly summarized in surveys of this area [1-3]. These face recognition algorithms generally consist of two steps : 1) feature extraction; and 2) classification.
The extraction of discriminative and stable features from the high dimensional face images is one challenging issue that remains unsolved in the face recognition systems . A set of well extracted features not only can speed up the following classification procedure, but also can improve the classification accuracy. The feature extraction is justified by the observation that the intrinsic dimensionality of the face images is much lower than the dimensionality of the vectors . The dimension reduction methods in varied forms can be generally classified into two groups: linear subspace analysis and manifold learning.
One group of the dimension reduction methods is the linear subspace analysis. This group method is extensively studied after the application of principal component analysis (PCA) , Bayesian maximum likelihood (BML) [6-8], and linear discriminant analysis (LDA) [9,10] in face recognition. Though the projection axes are obtained through different methods, the low dimensional features in all of these methods are obtained through projection. By maximizing the variances of the feature extraction results, the unsupervised PCA  is the best dimension reduction technique in the sense of reconstruction error. The technique of LDA [9,10] is a supervised method that searches for the projecting axes, on which the data points of different classes are far from each other while data points from the same classes are clustered together. While the PCA seeks for the best representations of the original samples, LDA tries to extract most discriminative features. So, the LDA-based methods generally outperform the PCA-based methods  in such a pattern classification problem of face recognition. However, almost all LDA-based methods suffer from the small sample size (SSS) problem when the dimensionality of the samples is larger than the available training samples. To solve the SSS problem, Fisherface  first employs the PCA as a preliminary step to reduce the dimensionality of the faces images, and then uses LDA to generate projection axes for further dimension reduction.
Thanks to the Kernel technique, researchers have proposed nonlinear [11,12,13,22,29] extensions of these linear methods. While KPCA  is formulated as a nonlinear form of principal component analysis (PCA) , KFDA proposed in  and  are nonlinear schemes of Fisher discriminant analysis (FDA) for, respectively,...