Novel applications of Machine Learning to Network Traffic Analysis

PDF Publication Title:

Novel applications of Machine Learning to Network Traffic Analysis ( novel-applications-machine-learning-network-traffic-analysis )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 103

avoid this problem, two possible solutions are: 1) Firth logistic regression [4] or 2) Bayesian logistic regression [5]. We used the second approach. However, the results from both methods are very close. The next method to try was random forest, for which default parameters for all training (as provided by the R package: randomForest) were used. The results obtained from random forest are the most accurate for non-time series methods. We wanted finally to examine GBM. Considering that, as we will detail later on, our problem seems to have more troubles from the bias than from the variance (over-fitting) side, we expected good results from this method. We used default parameters when training (as provided by the R package: xgboost). We decided not to adjust the parameters for each SIM, because the number of SIMs (and training rounds) was too large, so we used the same default parameters for all trainings. This had an impact in GBM performance, due to the sensitivity of GBM to fine parameters tuning. We think that is the reason of the poor results from this method which were worse than expected. Other standard methods like Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were not considered in the tests, since these methods are highly demanding in processing time for model parameters tuning and they usually require an exhaustive parameters adjustment which could not be performed given the number of models to tune (as big as the number of devices). Having the prediction accuracy as our main performance indicator, we did not consider other possible indicators as false and true positive rate, sensitivity and specificity (proportion of positives or negatives respectively, that are correctly identified as such), etc. Nevertheless, we did perform a quick assessment on representative values of false and true positive rates obtained, at least, from some of the methods results. The best way to analyze the false and true positive rate behavior is using a so-called Receiver Operating Characteristic (ROC) curve. In Figure 4 we present the ROC curve for the results obtained with Random Forest; being, in this case, the value for the Area Under the Curve (AUC) of 0.956, which is quite good, as an AUC value of 1 is considered a perfect result. Both ROC and AUC show very good results. Figure 4 shows the different false and true positive rates obtained as we change the probability cut-off threshold. This threshold defines which values will be considered as positive or negative. The numbers inside Figure 4 provide explicitly some of these cut-off values (with all values, from 0 to 1, as a color gradient). For all methods that require a probability cut-off threshold, we have used a value of 0.5. From Figure 4, we can see that, in the case of random forest, a cut-off value of 0.5 provides a false positive rate of around 0.1, and a true positive rate of 0.9, which are both quite good. Doctoral Thesis: Novel applications of Machine Learning to NTAP - 101

PDF Image | Novel applications of Machine Learning to Network Traffic Analysis

PDF Search Title:

Novel applications of Machine Learning to Network Traffic Analysis

Original File Name Searched:

456453_1175348.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)