With the tremendous growth in use of computer networks day by day, network as well as information security becomes the prime important factor. The basic aim of security is to develop protective software system which can provide three main security goals that are confidentiality, integrity and authentication. Intrusion is any activity which tries to violate these security goals. The intrusion detection system (IDS) plays key role in identifying such malicious activities.
The term IDS was first introduced by Anderson in 1980. Traditional IDS has significant drawbacks in terms of handling high dimensional data, false positives (classifying normal traffic as malicious) and false ...view middle of the document...
Section III elaborates proposed approach used. Section IV describes the experimental results and dataset used and conclusion along with future scope is provided in Section V.
An intrusion detection system (IDS) can be host based or network based. In , a data mining based network intrusion detection framework in real time is presented. In this paper also network based IDS system is focused to improve its performance.
A. Feature selection (FS)
It is the process of selecting subset of features from available features to reduce dimensionality of dataset. In FS redundant (duplicated valued) and irrelevant (contains no useful information) features are discarded. FS is an effective machine learning approach which further helps in building efficient classification system. With reduced feature subset, time complexity is reduced with improved accuracy of a classifier.
There are three standard approaches for feature selection: embedded, filter, and wrapper. In embedded approach FS occurs as a part of data mining algorithm. Filter method selects features independent of classifier used while in wrapper method features are selected specific to classifier intended. Filter method use any statistical way to while selecting features whereas wrapper uses a learning algorithm to find best subset of features. Computationally wrapper approach is more expensive and slower than filter approach but gives more accurate results than filter.
The FS algorithm consists of two main components: evaluation functions and search algorithm. Evaluation function is explaining which approach is used for selection. According to their working, search methods can be classified as exponential, sequential or randomized. Exponential method has exponential complexity, randomized selects the features randomly giving high accuracies and in sequential method features are linearly added or subtracted.
1. Correlation based Feature Selection (CFS) – CFS evaluates the worthiness of subset of features using heuristic method. A feature which is highly associated to a class is considered as good and selected. In each subset attribute are selected by considering the degree of redundancy between them and predictive ability of each individual feature. So, there is need to define appropriate correlation measure which can list most important and highly effective features. A function for evaluating best feature is:
Where, Merit is the heuristic merit of feature subset S containing K feature, crfc is the average feature class co-relation and crff is average feature-feature co-relation....