Novel applications of Machine Learning to Network Traffic Analysis

PDF Publication Title:

Novel applications of Machine Learning to Network Traffic Analysis ( novel-applications-machine-learning-network-traffic-analysis )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 179

Table 5. Classification results when increasing the number of training samples with synthetic samples. Predictions are done with NSL-KDD Test dataset. The values in Table 5 are color coded (same code as Table 1). We can appreciate that the increase in performance is clear when increasing the number of synthetic samples. It is also clear the difference between the first three columns and the next three, for almost all options and classifiers. This difference provides evidence that employing synthetic data increases classifiers performance, while simply repeating the original data does not provide any significative advantage. It is important to realize that this happens when using very different classifiers. We can see in Table 5 that for Option C there is always an increase in performance for all classifiers when employing additional synthetic data. Moreover, using a balanced dataset (last columns) provides best results, at least for Option C, which is the Option we have chosen as our best model. Finally, we compare VGM with seven SOTA synthetic over-sampling algorithms: (1) SMOTE [9], (2) SMOTE Borderline [10], (3) SMOTE+ENN [12, 7], (4) SMOTE+Tomek [12, 7], (5) ADASYN [14], (6) SMOTE-SVM [11] and (7) EasyEnsemble [15, 7]. Table 6 presents a comparison of several classification performance metrics: accuracy and F1 score [20], when different well-known classifiers are trained with synthetic data generated by the aforementioned over-sampling algorithms. To avoid bias due to specific effectiveness of synthetic data with some particular classifier, we repeat the experiment with four classifiers: random forest, multinomial logistic regression, linear SVM and MLP. In all cases, the classifiers are trained with a balanced dataset that is constructed using the different synthetic data generation algorithms. The base dataset used to generate the synthetic data has been the NSL-KDD Training dataset. All the prediction metrics (accuracy and F1) are obtained with the NSL-KDD Test dataset. We can observe (Table 6) that VGM exhibits a better average performance than the other algorithms; some give better results for a specific classifier but in average VGM gives the best results. The results depend on both the classifier and the oversampling method. The intention here is to show that VGM provides average results as good as any SOTA oversampling method and in many cases better. We used the weighted average provided by scikit-learn [31] to calculate F1 score. The values in Table 6 are color-coded in a manner similar to previous tables. Doctoral Thesis: Novel applications of Machine Learning to NTAP - 177

PDF Image | Novel applications of Machine Learning to Network Traffic Analysis

PDF Search Title:

Novel applications of Machine Learning to Network Traffic Analysis

Original File Name Searched:

456453_1175348.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)