logo

Novel applications of Machine Learning to Network Traffic Analysis

PDF Publication Title:

Novel applications of Machine Learning to Network Traffic Analysis ( novel-applications-machine-learning-network-traffic-analysis )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 174

4. Results The objective of this section is first to prove that the synthetic data is similar but not identical to the original data (Section 4.1), and, this similarity is maintained when the data is conditionally partitioned by its class label. And, secondly, to show that the new synthetic data can be used as new training data, improving the results obtained with several prediction algorithms (Section 4.2) 4.1. Structure of generated data In this section we will show that the synthetic generated data have similar probabilistic structure to the original data. Verifying this similarity is a hard problem since it involves comparing the probability distributions of multivariate vectors (116 features) with non- Gaussian marginals (discrete and continuous features) and complex joint probability distributions. The challenge is twofold: obtain the joint probability distributions and compare them. Methods based on information theory (e.g. Kullback-Leibler (KL) divergence) require an estimate of joint probabilities that is very difficult for high-dimensional variables [25], and many of them are not practically applicable for multivariate distributions (e.g. mutual information and KL divergence) [26]. Other methods based in multivariate extensions of goodness-of-fit tests are also difficult to apply considering the high dimensionality and non- Gaussian marginal distributions [27, 28, 29] Considering the difficulties mentioned above, we have developed several approaches to verify the similarity: (1) extended histograms of the original and synthesized features; and (2) classification results obtained from the application of original and synthesized data to several classification algorithms. Considering the first approach, Figure 7 presents extended histograms for the original NSL- KDD training dataset (upper diagram) and a synthesized dataset created with the same labels as the original one (lower diagram). To visualize the data in Figure 7, we use [30] which makes possible to visualize and compare the distributions of large datasets. The columns of the diagrams correspond to features. The rightmost 4 columns are the categorical variables, respectively: protocol (3 values), service (70 values), flag (11 values) and label (5 values). All features values are ordered in accordance with the alphabetical order of the label. The rows are divided in 100 slots associated to 100 bins where the continuous features have been mapped. The colors in the slots represent where the mean value is for that slot, with a different color to show the dispersion (similar to a box-plot) We can observe, in Figure 7, that both diagrams present a similar distribution over the features. The intention of the diagram is to show the general similarity of distributions, providing an overall impression of similarity when comparing feature to feature from original and synthetic data. This is the reason why we do not give the names of the features in the diagram, since we are not interested here in a comparison of particular features. It is important to note that the synthesized data corresponds to data generated from a forward pass of the model (Option C, Section 3.3.3), and each time we generate a new set of synthesized data this dataset will be different, due to the stochastic nature of the layer of latent variables. Doctoral Thesis: Novel applications of Machine Learning to NTAP - 172

PDF Image | Novel applications of Machine Learning to Network Traffic Analysis

novel-applications-machine-learning-network-traffic-analysis-174

PDF Search Title:

Novel applications of Machine Learning to Network Traffic Analysis

Original File Name Searched:

456453_1175348.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP