logo

Novel applications of Machine Learning to Network Traffic Analysis

PDF Publication Title:

Novel applications of Machine Learning to Network Traffic Analysis ( novel-applications-machine-learning-network-traffic-analysis )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 164

1. Introduction The objective of a Network Intrusion Detection System (NIDS) is to automate the detection of policy and security violations and malicious activities against a host that is part of a network. As the importance and volume of data exchange increases, it is more relevant to increase the performance of NIDS. Most current NIDS are based on supervised machine learning models which require labeled data to perform model training. It is fundamental for any detection system the possibility of accessing balanced and diversified data to train the system. Intrusion data rarely have these characteristics, as the network traffic samples are strongly biased to the normal type of traffic, being difficult to access traffic associated to anomalous intrusion events. Therefore, it would be very interesting to be able to synthesize intrusion data with a structure similar to the real data. In this way, we avoid investing significant resources in tasks such as obtaining additional intrusion data or manually simulating attacks. The generation of synthetic data that resemble real data, being similar (in a probabilistic sense) but not identical, has long been an objective in the area of image processing [1, 2]. In this area the features used to train the models are all continuous (pixel intensities) and, additionally, it is relatively easy to appreciate if a generative model is working well, as we (humans) are quite good at identifying whether the images generated corresponds to the class of images we want. Similar works have been carried out more recently in the generation of text/sentences [3, 4]. In this case the features are all discrete, and we can appreciate directly, in a similar way, if the generated text corresponds to a particular topic. In the intrusion detection area the generative data process has its own difficulties, due to several reasons: (1) the features used to identify an intrusion type (label) are both continuous and categorical, (2) the class labels are highly unbalanced, and (3) we cannot directly appreciate whether a new synthetic sample of a particular class really correspond to that class (intrusion label). The first and second imply that we need to synthesize, at the same time, discrete and continuous features, each having its own problematic. The third requires developing alternative techniques to show the similarity of original and synthetic data. We need to identify if two populations of samples (real and synthetic) belong to the same class. Taking into account that the samples represent multivariate high-dimensional vectors, with continuous and discrete values, complex joint probability distribution and with non-Gaussian marginals (or another easily parameterized distribution), it is very difficult to apply methods based on information theory (e.g. KL divergence) or multivariate extensions of goodness-of-fit tests to identify the similarity between the probability distributions of the two populations (Section 4.1). That is why we have applied other alternative and original approaches to evaluate similarity: (1) Extended histograms of the original and synthesized features, and, (2) demonstrate that classification results are similar when using either the original or synthesized data as training or testing data for several classifiers (e.g. Random Forest, Multinomial Logistic Regression...). The problem mentioned above is not found when generating synthetic images or text, since, as already discussed, we can discriminate samples that belong to specific objects (images) or topics (text). In the case of samples related to intrusion detection, there is not such good discriminator. To appreciate the complex and unclear relationship between the distributions of values of a high-dimensional sample with the label associated to that sample, we could consider the difficulties imposed by adversarial examples [5] to a neural network and how the addition of small perturbations to images can mislead a perfectly tuned neural network, resulting in misclassified images, even when these perturbations do not affect the discrimination capacity of humans. Doctoral Thesis: Novel applications of Machine Learning to NTAP - 162

PDF Image | Novel applications of Machine Learning to Network Traffic Analysis

novel-applications-machine-learning-network-traffic-analysis-164

PDF Search Title:

Novel applications of Machine Learning to Network Traffic Analysis

Original File Name Searched:

456453_1175348.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP