
PDF Publication Title:
Text from PDF Page: 135
Figure 1. Comparison of ID-CVAE with a typical VAE architecture. Μ In a VAE, the way we learn the probability distributions: π(π/πΏ) and π(πΏ/π), is by using a variational approach [26], which translates the learning process to a minimization process, that can be easily formulated in terms of stochastic gradient descent (SGD) in a neural network. In Figure 1, the model parameters: π½ and π, are used as a brief way to represent the architecture and weights of the neural network used. These parameters are tuned as part of the VAE training process and are considered constant later on. In the variational approach, we try to maximize the probability of obtaining the desired data as output, by maximizing a quantity known as the Evidence Lower Bound (ELBO) [22]. The ELBO is formed by two parts: (1) a measure of the distance between the probability distribution π(π/πΏ) and some reference probability distribution of the same nature (actually a prior distribution for π), where the distance usually used is the Kullback-Leibler (KL) Μ divergence; and (2) the log likelihood of π(πΏ) under the probability distributionβ‘π(πΏ/π), that is the probability to obtain the desired data (πΏ) with the final probability distribution that Μ producesβ‘πΏ. All learned distributions are parameterized probability distributions, meaning that they are completely defined by a set of parameters (e.g., the mean and variance of a normal distribution). This is very important in the operation of the model, as we rely on these parameters, obtained as network nodes values, to model the associated probability Μ distributions: π(πΏ/π)β‘andβ‘π(π/πΏ). Based on the VAE model, our proposed method: ID-CVAE, has similarities to a VAE but instead of using exclusively the same data for the input and output of the network, we use additionally the labels of the samples as an extra input to the decoder block (Figure 1, lower diagram). That is, in our case, using the NSL-KDD dataset, which provides samples with 116 features and a class label with five possible values associated with each sample, we will have a vector of features (of length 116) as both input and output, and its associated label (one-hot encoded in a vector of length 5) as an extra input. To have the labels as an extra input, leads to the decoder probability distributions being conditioned on the latent variable and the labels (instead of exclusively on the latent variable: π), while the encoder block does not change (Figure 1, lower diagram). This apparently small change, of adopting the labels as extra input, turns out to be an important difference, as it allows one to: β’ Add extra information into the decoder block which is important to create the required binding between the vector of features and labels. β’ Perform classification with a single training step, with all training data. β’ Perform feature reconstruction. An ID-CVAE will learn the distribution of features values using a mapping to the latent distributions, from which a later feature recovery can be performed, in the case of incomplete input samples (missing features). In Figure 2 we present the elements of the loss function to be minimized by SGD for the ID-CVAE model. We can see that, as mentioned before, the loss function is made up of two parts: a KL divergence and a log likelihood part. The second part takes into account how Μ probable is to generate πΏ by using the distribution π(πΏ/π, π³), that is, it is a distance between πΏ Μ and πΏ. The KL divergence part can be understood as a distance between the distribution π(π/πΏ) and a prior distribution for π. By minimizing this distance, we are really avoiding that π(π/πΏ) departs too much from its prior, acting finally as a regularization term. The nice feature about this regularization term is that it is automatically adjusted, and it is not necessary to perform cross-validation to adjust a hyper-parameter associated to the regularization, as it is needed in other models (e.g., parameter π in ridge regression) Doctoral Thesis: Novel applications of Machine Learning to NTAP - 133PDF Image | Novel applications of Machine Learning to Network Traffic Analysis
PDF Search Title:
Novel applications of Machine Learning to Network Traffic AnalysisOriginal File Name Searched:
456453_1175348.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
| CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |