Self-Organizing Map for Image Classification



The Use of Self Organizing Map Method and Feature
Selection in Image Classification System

Introduction:

Artificial Neural Network (ANN) is defined as an information processing
system that has characteristics resembling human neural tissue. The
existence of ANN provides a new technology to help solve problems that
require thinking of experts and computer based routine. A few of ANN
application was for classification system (clustering), association, pattern
recognition, forecasting and analysis
In this example the ANN method which applied is SOM method, that to be
used to classify the image into a set of five different classes.
In the Self Organizing Map (SOM) method, the applied learning is an
unsupervised learning where the network does not utilize the class
membership of sample training, but use the information in a group of
neurons to modify the local parameter.
The SOM system is adaptively classify samples (X image data) into classes
determined by selecting the winning neurons are competitive and the
weights are modified.




Methodology:

Reducing the dimensional image by using (PCA) algorithm.
Classification the image to a set of matrixes by using (LSA) algorithm
Classification the image using Self-Organizing Map

Feature Vector Representation:

The image will be grouped into clusters previously required two prior processing, the color conversion and histogram to produce a feature vector of a set of image.
Color Conversion Color conversion is referred to in this paper is the conversion of RGB color images (24 bits) into grayscale (8 bits), so that the color model will be simpler with each pixel grey level between 0 to 255. The conversion formula is:









Histogram Histogram:

Or color histogram is one of the techniques of statistical features that can be used to take the feature vector of a data set, images, video or text. The generated feature vector of color histogram will be a probability value of “hi , ni” is the number of pixels of i color intensity that appears in the m image divided by the total n image pixel.
Before we can use our SOM method we need to introduce some algorithms that it will interact with the SOM algorithm in classification task.


Feature Vector Selection:-

Feature vector selection technique focuses on two methods namely PCA (Principal Component Analysis) and LSA (Latent Semantic Analysis), which is a preprocessing stage of the classifying process.
The purpose of this feature vector selection is to reduce the K dimensional matrix obtained from the characteristic features of the previous vector to < K dimensions without reducing the important information on it


Principal Component Analysis (PCA) algorithm:

PCA method is a global feature selection algorithm which first proposed by Hotteling (1933) as a way to reduce the dimension of a space that is represented in statistics of variables (xi, i = 1,2… n) which mutually correlated with each other.
During its development, PCA algorithm (also called the Hotteling transformation) can be used to reduce noise and extract features or essential characteristics of data before the classification process.
Election of global feature selection technique in this case because the images will be classified to have relatively low frequency (low level), so the PCA method is quite suitable to be applied

Latent Semantic Analysis (LSA) algorithm:

LSA is a statistical method that was originally used in the field of Artificial Intelligence branch of Natural Language Processing (NLP) to analyze the data as plain text or document.
LSA or also known as LSI (Latent Semantic Indexing), in its application can also be used to process image data to form a new matrix decomposition of the initial matrix into three matrices which correlated.
This technique is then called by the Singular Value Decomposition (SVD), which is part of the LSA

The algorithm on the SOM neural network as follows:

If the feature vector matrix of size k x m (k is the number of feature vector dimensions, and m is the number of data), the initialization:

 The number of the desired j class or cluster
 The number of component i of the feature vector matrix (k is the row of matrix)
 The number of vector Xm,i = amount of data(matrix column)
 The initial weights Wji were randomly with interval 0 to 1
 The initial learning rate α(0)
 The number of iteration (e epoch)

Execute the first iteration until the total iteration (epoch)
Calculate the vector image to start from 1 to m:




For all of j
 Then determine the minimum value of D(j)
 Make changes to the j weight with the minimum of D(j)






Modify the learning rate for the next iteration:

α (t + 1) = 0,5 α (t)

Which t start from the first iteration to e.
Test the termination condition Iteration is stopped if the difference between Wji and Wji the previous iteration only a little or a change in weights just very small changes, then the iteration has reached convergence so that it can be stopped
Use a weight of Wji that has been convergence to grouping feature vector for each image, by calculating the distance vector with optimal weights.
Divide the image (Xm) into classes:



If D (1) <D (2) <D (3) <D (4) <D (5) then the images included in class 1
If D (2) <D (1) <D (3) <D (4) <D (5) then the images included in class 2
If D (3) <D (2) <D (1) <D (4) <D (5) then the images included in class 3
If D (4) <D (3) <D (2) <D (1) <D (5), then the images included in class 4
If D (5) <D (4) <D (3) <D (2) <D (1) then the images included in class 5


Implementation:
































This figure shows that some processing needs to be done previously until the images can be classified into five classes, which in this study consists of five classes namely flower class, animal class, car class, river class, and mountain classes.

The image used in this study were as many as 250 images by dividing each image of the 50 pieces in each class.





In the color histogram, feature vector of 250 images is retrieved and will generate feature vector matrix i x m, which is 256 x 250. 256 is the number of grey level (0 – 255) from the grayscale color conversion process. The matrix will then be reduced to the dimensions as r columns or rows through PCA and LSA feature selection algorithms. From the result of algorithm, will be gained a new feature matrix to the size k x m (k is the dimension after reduction, and m is the total images).
So if the initial dimensions of the image matrix is 256x250, then after feature selection (e.g., r reduction as much as 156 vectors), the size of the matrix will be changed to 100x250.
Then from the matrix k x m will be classified into five classes by SOM neural network algorithm, with the network parameters that are used:
 The number of class (j) = 5
 The number of vector component (i) = 50 ; 100 ; 150
 The number of X vectors = 250
 The initial weights (Wji) = 0 to 1 (random)
 The initial learning rate (α) = 0.5
 Total iteration or epoch (e) = 500
The experimental results will be seen from the large percentage of SOM accuracy in classifying the images corresponding to the class by using an application program, where the formula of accuracy is following:





Conclusions:

1. The selection method of PCA and LSA feature selection precise enough to be implemented in the image classification system, because it can reduce the dimensions of the image matrix
2. With the image classification techniques can be used to facilitate and speed up image (search), retrieval system, because the image can look directly into the appropriate classes without having to search one by one from each class.









تعليقات

المشاركات الشائعة من هذه المدونة

Evolutionary computation

Self-Organizing Map in web mining Technical report