PDF Publication Title:
Text from PDF Page: 079
classifier for intrusion detection. The work carries out an extensive comparison of the synthetic data produced by the new method with data produced by classis over-sampling techniques showing the better performance (when used as synthetic training data) of the new proposed method. 7.5.2 Datasets For this work we have used the NSL-KDD [67] dataset. This is a classic Intrusion Detection dataset. The dataset has 32 continuous and 3 categorical features, with an intrusion label of 5 values (Normal, DoS, Probe, R2L and U2R). This is a quite unbalanced dataset. We have performed an additional data transformation: scaling all NSL-KDD continuous features to the range [0,1] and one-hot encoding all categorical features. This provides a final dataset with 116 features: 32 continuous and 84 with values in {0,1} associated to the three one-hot encoded categorical features. The three categorical features: protocol, flag and service have respectively 3, 11 and 70 distinct values. The accuracy obtained when synthesizing these discrete features (having as reference the original ones) depends heavily on the cardinality of the feature. We provide all results using the full original training dataset of 125973 samples and the full original test dataset of 22544 samples. 7.5.3 Models The novel proposed architecture consists of a VAE which tries to recover an output identical to the inputs (the inputs being the network traffic features used to detect the intrusion class) but introducing a variation to the normal VAE consisting of the inclusion of an additional input to the decoder. This additional input is the one-hot encoded class label. The addition of this input is critical to improve the model in two directions: making easier the data generation process (which is now conditioned on the class label) and producing better synthetic data which is more closely related to the original one in terms of probability distribution conditioned on the class label. To arrive to the proposed model, we have analyzed different VAE architecture variants, providing an extensive study on the alternatives. Besides the proposal of a new architecture based on a conditional VAE we have used several machine learning techniques to demonstrate that the generated synthetic data can be used to improve the intrusion detection results of several classifiers (Random Forest, Logistic Regression, SVM and MLP). The work also shows that the synthetic data has a similar probability distribution for the features depending on their intrusion classes. We have developed several approaches to verify the similarity: (1) extended histograms of the original and synthesized features; and (2) Doctoral Thesis: Novel applications of Machine Learning to NTAP - 77PDF Image | Novel applications of Machine Learning to Network Traffic Analysis
PDF Search Title:
Novel applications of Machine Learning to Network Traffic AnalysisOriginal File Name Searched:
456453_1175348.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)