PDF Publication Title:
Text from PDF Page: 106
were not as expected but unusually poor. The next time-series method tried was Exponential Smoothing [7], performing also a training of the algorithm for each particular SIM, using as predicted variable the on/off activity during the predicted period. The results for exponential smoothing are neither good (see Figure 8). The periodic nature of the signals and the rapid loss of memory of this method, due to its exponential decreasing factor for passed times, may be the reason for its poor performance. The following method tried was ARIMA [8], in a similar way to the other methods; we performed a training of the algorithm for each particular SIM, using as predicted variable the on/off activity during the predicted period. The results for ARIMA are very good. The reasons behind these advantageous results are, mainly, the fine automatic parameters adjustment done by the auto.arima function from the forecast R package. The auto.arima function automatically adjusts the parameters ("(p,d,q)(P,D,Q)m"), considering also seasonality. The use of an algorithm that automatically adjusts the above-mentioned parameters is critical, since the manual adjustment using ACF (Auto Correlation Function) and PACF (Partial Autocorrelation Function) [8] was not possible, because adjusting the parameters for each SIM would be extremely computationally demanding. Finally, ARIMAX was applied [8], performing again a training of the algorithm for each particular SIM, using the day of the week (7 possible values) and hour of day (24 values) as predictor variables (also named covariates) and the on/off activity in one-hour period as the predicted variable. For this particular problem, preparing the additional data set of covariates required by ARIMAX was an easy task, considering that the day of the week and the hour of a day are known information, both for the training and testing data, in other occasions, in order to prepare the testing covariates an additional prediction task is necessary. For ARIMAX we used also the auto.arima function from the forecast R package, providing an additional data set with values of the covariates for each value of the training data (the on/off activity variable in the training data). This additional data set is necessary, since ARIMAX uses these external covariates as predictors on top of the usual past values and errors from the time-series. The results from ARIMAX have been the best of all the applied methods. The use of the ARIMAX [9] method was not initially planned but was considered after examining the good behavior of ARIMA and the right results coming from the non-time-series methods (logistic regression and random forest), even when just using two predictors (time of day and day of the week). The SIM’s activity distribution seems to have a clear structure defined both from its past activity and, equally important, from external predictors as the day of the week and hour of day. The ARIMAX method is able to combine both aspects, as the “ARIMA part” takes into account the past of the signal and the “exogenous covariates part” considers other external variables to incorporate to the prediction. In this case we have precisely considered the day of the week and the hour of day as these “exogenous” external variables. Similarly, to the non-time-series methods, we have come to the conclusion that, having a minimum number of days for training, the performance does not seem to improve by increasing the amount of days. Actually, we have seen that for ARIMA, the method does not improve its performance when increasing the number of days of training beyond 5-7 days (see Figure 5 for similar behavior for non-time-series methods). ARIMA performs slightly better than random forest (the best non-time series) but from a practical point of view their performances are identical (see Figure 8). From a computing perspective, the faster of all methods is logistic regression (similar to bayesian logistic regression). The ranking is followed by ETS, random forest, GBM, ARIMA, Doctoral Thesis: Novel applications of Machine Learning to NTAP - 104PDF Image | Novel applications of Machine Learning to Network Traffic Analysis
PDF Search Title:
Novel applications of Machine Learning to Network Traffic AnalysisOriginal File Name Searched:
456453_1175348.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)