2) Matrix Factorization
Matrix factorization approach is one more representation of pLSA. The word frequency
matrix that defines the dataset is a very large and sparse matrix; it has number of rows equal to the documents d, and the number of columns is the number of different words k that appear in our corpus. The reason for sparseness is because only a small percentage
of the words are used in each document depending on its particular topic. So, dimensionality reduction is an issue for word frequency matrix as most of its entries are null providing no specific detailing. This can be attained by approximating the co-occurrence matrix (denoted by F) as a product of two low-rank (thinner) ...view middle of the document...
Thus histograms for each image is obtained also called as Corpus (Inverted index) which indicates the frequency of occurrence of each word in documents. Observed corpus is then being used as an input for pLSA which results which outcomes the probability of occurrence of a topic in a document (P(z|d)) and probability of occurrence of a word in a given topic(P(w|z)) as its output . Former is of vital use further. Query Vector indicates how many times a word is occurring in a histogram. Query vector is the histogram or the frequency matrix of the test image. PLSA is applied on this Query Vector and the result comes out to be a topic document probability vector, this vector is then compared with the probability vector obtained while training.
Testing phase includes same procedure of finding vocabulary of testing images. The difference is that word topic probability is kept constant in training and used as an input in testing phase. K nearest neighbour search algorithm is then applied to get the nearest vector to the query vector in training images. Once, this nearest neighbour is obtained it then reveals the final category to which the test image will be most probably belong.
pLSA is a robust methodology for Scene classification which ensures its strong engrave in various other fields also. In 2011, Automatic landslide Detection from the remote sensing imagery technique was developed by Gong Cheng and his team. The purpose was achieved using a scene classification method based on the bag-of-visual-words (BoVW) representation in combination with the unsupervised probabilistic latent semantic analysis (pLSA) model and the k-nearest neighbour (k-NN) classifier. Then the results were compared with the more conventional maximum-likelihood classification (MLC) approach (Jia and Richards 1994) using the same remote-sensing imagery and the proposed technique rules over the MLC approach especially when using CSIFT in combination with pLSA, the detection accuracy was 16.42% higher and the false alarm rate was 13.45% lower than that obtained using MLC. The results investigated are summarized below and these will be very helpful in landslide inventory mapping and landslide hazard assessment in landslide prone areas.
 Zhiliang Wang, Rong Wang , Xirong Ma. Indoor Scene Recognition Based on the
Weighting Spatial Information Fusion, Second International Conference on Intelligent
System Design and Engineering Application. 
 Biao Jin, Wenlong Hu, and Hongqi Wang. Image Classification Based on pLSA Fusing
Spatial Relationships Between Topics. IEEE signal processing letters, vol. 19, no. 3,
 Gong Cheng, Lei Guo a, Tianyun Zhao, Junwei Han, Huihui Li & Jun Fang. Automatic
landslide detection from remote-sensing...