PDF Publication Title:
Text from PDF Page: 102
show a maximum in performance for a 24 hours period and a minimum around a 12 hours period. This behavior is similar for all methods. An explanation for this behavior is given in section 5, connecting the mean global activity of the SIMs with the mean accuracy of the forecasts. The training speed (i.e. the time it takes for the algorithms to tune its parameters) is very high for all methods except GBM which is much slower (see section 3.3). Figure 3 shows the performance results for non-time-series methods; the upper diagram presents the mean accuracy over a prediction period of 48 hours. The mean accuracy is calculated using the process presented in section 2.3 (Figure 1). Lower diagram presents the standard deviation of the accuracy values used to build the upper diagram. Figure 3. Performance results for non-time-series methods. As already mentioned, the non-time-series methods explored have been: Logistic regression, Bayesian logistic regression, Random Forest and GBM. All of them are well known methods with good performance in several areas of application. For all these methods, a training of the algorithm for each particular SIM was performed, using the day of the week (7 possible values) and hour of day (24 values) as predictor variables, and the on/off activity in one-hour periods as the predicted variable. We considered these features as the best due to the time-series nature of the data. We tried to add additional predictors related with other time elapsed periods in hours (e.g. 2 or 4-hour periods) not improving the results significantly. Other available features were not used since the computational needs would have increased substantially. Intuitively another interesting feature to explore could be the customer, since devices from the same customer may have similar traffic patterns (e.g. a smart meter from a utility, or a connected vehicle from a car manufacturer); this feature could be explored in future work. The results for logistic regression were quite satisfactory; nevertheless, we incurred in a complete separation (also named perfect separation) problem during training. This problem happens when one or several independent variables can fully predict the result, this usually implies over-fitting (results are good for the training set but not as good for the real set). To Doctoral Thesis: Novel applications of Machine Learning to NTAP - 100PDF Image | Novel applications of Machine Learning to Network Traffic Analysis
PDF Search Title:
Novel applications of Machine Learning to Network Traffic AnalysisOriginal File Name Searched:
456453_1175348.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)