Novel applications of Machine Learning to Network Traffic Analysis

PDF Publication Title:

Novel applications of Machine Learning to Network Traffic Analysis ( novel-applications-machine-learning-network-traffic-analysis )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 167

enhanced version of the original KDD-99 dataset, solving the problem of redundant records present in KDD-99. We consider that this dataset is useful for this work, as we are mainly interested in generation of synthetic data, for which NSDL-KDD provides a sufficient number of samples. Additionally, the distribution of samples among intrusion classes (labels) is quite unbalanced and provides enough variability between training and test data to challenge any method that tries to reproduce the structure of the data. The NSL-KDD dataset provides 125973 training samples and 22544 test samples, with 41 features, being 38 continuous and 3 categorical (discrete valued). Each training sample has a label output from 23 possible labels (normal plus 22 labels associated to different types of anomaly). The test data has the same number of features (41) and output labels from 38 possible values. That means that the test data has anomalies not presented at training time. The 23 training and 38 testing labels have 21 labels in common; 2 labels only appear in training and 17 labels are unique to the testing data. Around 16% of the samples in the test dataset correspond to labels unique to the test dataset, and which were not present at training time. The existence of new labels at testing introduces an additional challenge to the learning methods, which is important to verify the robustness of the classifiers, but not for the purpose of this study, which is to synthesize samples associated to existing labels. Therefore, it seems more practical and useful to aggregate labels by categories. As presented in [16], the original labels are associated to 5 categories: NORMAL, PROBE, R2L, U2R and DoS, with the latter four corresponding to an anomaly. The meaning of the 5 categories is as follows: • NORMAL: There is no attack • Denial of Service (DoS): The intention of these attacks is to interrupt some service. • PROBE: They intend to gain information about the target host. • User to Root (U2R): U2R attacks try to obtain root access to the system. • Remote to Local (R2L): Unauthorized access from a remote machine. For this work we have used these 5 categories as the labels driving our data generation model. We have performed an additional data transformation: scaling all NSL-KDD continuous features to the range [0,1] and one-hot encoding all categorical features. This provides a final dataset with 116 features: 32 continuous and 84 with values in {0,1} associated to the three one-hot encoded categorical features. It is important to note that the 3 categorical features: protocol, flag and service have respectively 3, 11 and 70 distinct values. We will show later the accuracy obtained when synthesizing these features (having as reference the original ones), and how the different number of values impacts on the results. Working with the NSL-KDD dataset, we provide all results using the full training dataset of 125973 samples and the full test dataset of 22544 samples. It is also important to mention that we do not use a previously customized training or test datasets, neither a subset of them, what may provide better alleged results but being less objective and also missing the point to have a common reference to compare. 3.2. Method explained In Figure 1 we present a diagram comparing VGM and VAE architectures. In a VAE architecture [1] we model the internal structure of data with an initial neural network (encoder) that approximates the parameters of a probability distribution. This probability distribution is used to draw samples that are the input to a second neural network (decoder) that approximates the parameters of a second probability distribution from which the samples drawn are the final Doctoral Thesis: Novel applications of Machine Learning to NTAP - 165

PDF Image | Novel applications of Machine Learning to Network Traffic Analysis

PDF Search Title:

Novel applications of Machine Learning to Network Traffic Analysis

Original File Name Searched:

456453_1175348.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)