Novel applications of Machine Learning to Network Traffic Analysis

PDF Publication Title:

Novel applications of Machine Learning to Network Traffic Analysis ( novel-applications-machine-learning-network-traffic-analysis )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 169

parameters (e.g. the mean and variance for a normal distribution). ฬ‚ The final objective is to produce an output ๐‘ฟ with a minimum difference to the inputโก๐‘ฟ. There are different ways to achieve this objective, one is to use sampling methods as Markov Chain Monte Carlo (MCMC), but the path taken by VAE is different. VAE uses a variational approach that tries to maximize the log likelihood of X by maximizing a quantity known as the Evidence Lower Bound (ELBO) [1]. The ELBO is formed by two parts: (1) a measure of the distance between the probability distribution ๐’’(๐’/๐‘ฟ) and a reference probability distribution of the same nature (actually a prior distribution for Z), where the distance usually employed is the Kullback-Leibler (KL) divergence, and (2) the log likelihood of ๐’‘(๐‘ฟ) under the probability ฬ‚ distributionโก๐’‘(๐‘ฟ/๐’), which is the probability to obtain the desired data (๐‘ฟ) with the probability ฬ‚ distribution that producesโก๐‘ฟ. Using the ELBO, we reduce the problem to an optimization problem based on a maximization of the ELBO, which allows using neural networks with stochastic gradient descent (SGD) as the optimizer. The only problem remains on how to incorporate the sampling process, required by the model, with the way SGD operates. To do this, the innovation of VAE is to use what is called the โ€œreparameterization trickโ€ [1]. Using this trick, all the variables involved are connected through differentiable layers on which SGD can operate. Based on the VAE model, our proposed method (VGM) is similar to a VAE but instead of using the same vector of features for the input and output of the network, we add more flexibility allowing to have a different input to the network and to add an optional additional input on the decoder block (Figure 1, lower diagram). We represent this generic input in Figure 1 with the letter ๐‘ฐ. The input ๐‘ฐ can be instantiated in two possible ways: as the vector of sample features, as in VAE, or as the vector of labels associated to the samples. That is, the input ๐‘ฐ can be either X or L, where L is the vector of labels. In the VGM architecture, in case we use the vector of features as input to the encoder (as in VAE) then we will employ an additional input to the decoder formed by the vector of labels. In this way we will always have the vector of labels as an input to the network, either as input to the encoder or decoder blocks. To use the labels as input of the generative process is an important difference as it allows generating new synthesized samples using exclusively the labels assigned to these samples. As already pointed out, for intrusion detection data, the generative data process is more difficult, as the features are both continuous and categorical, and we cannot appreciate directly if the synthesized data samples have features with a similar structure to the original ones, that is the reason why using directly the labels is important to be sure we are using meaningful information to characterize the generated samples. In Section 3.3 we will present different variants to the generic architecture for VGM shown in Figure 1. In Figure 2 we present the elements of the loss function to be minimized by SGD for the VGM model. We can see that, as mentioned before, the loss function is made up of two parts: a KL divergence and a log likelihood part. The second part takes into account how ฬ‚ probable is to generate ๐— by using the distribution ๐’‘(๐—/๐™), that is, it is a distance between ๐‘ฟ ฬ‚ and ๐‘ฟ. The KL divergence part can be understood as a distance between the distribution ๐’’(๐’/๐‘ฐ) and a prior distribution for ๐’โก, that we identify as ๐‘žโก๐‘Ÿ๐‘’๐‘“๐‘’๐‘Ÿ๐‘’๐‘›๐‘๐‘’ in Figure 2. By minimizing this distance, we are really avoiding that ๐’’(๐’/๐‘ฐ) departs too much from its prior, acting finally as a regularization term. The nice feature about this regularization term is that it is automatically adjusted, and it is not necessary to perform cross-validation to adjust a hyper- parameter associated to the regularization, as it is needed in other models (e.g. ridge regression, soft-margin support vector machines...). Doctoral Thesis: Novel applications of Machine Learning to NTAP - 167

PDF Image | Novel applications of Machine Learning to Network Traffic Analysis

PDF Search Title:

Novel applications of Machine Learning to Network Traffic Analysis

Original File Name Searched:

456453_1175348.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)