PDF Publication Title:
Text from PDF Page: 125
Table 4.6 – Spam classication performance. Our performance baseline includes all the features from [Becchetti et al., 2008]. Each row represents adding features from either RAPr or the derivative based on a particular Beta distribution or value of α. The results are averaged over 50 repetitions of 10-fold cross validation with a 10-bag J48 decision tree classifier.After adding features based on RAPr and the derivative, we observe an improvement in the f -score. Consequently, these features uncover new information in the graph that is not expressed by PageRank. 4.8 ⋅ applications 103 Baseline Beta(1.5,0.5,0,0.99) Beta(-0.5,-0.5,0.3,0.99) Beta(0.5,1.5,0,0.99) Beta(10,10,0.3,0.7) Beta(1,1,0,1) Beta(2,16,0,1) Derivative (α = 0.75) Derivative (α = 0.85) Derivative (α = 0.95) Precision Recall f-score 0.694 0.558 0.618 0.692 0.557 0.617 0.698 0.564 0.624 0.695 0.561 0.621 0.690 0.560 0.620 0.698 0.562 0.622 0.699 0.562 0.623 0.697 0.563 0.623 0.697 0.561 0.622 0.700 0.560 0.620 False Posi- tive Ratio 0.034 0.034 0.033 0.034 0.034 0.033 0.033 0.033 0.033 0.033 False Nega- tive Ratio 0.442 0.443 0.436 0.439 0.442 0.438 0.438 0.437 0.439 0.440 precision fraction of spam pages corrected labeled as spam; recall fractionoftotalspampagesidentified; fscore harmonic mean of precision and recall; false positive fraction of non-spam pages mislabeled as spam; and false negative fraction of spam pages mislabeled as non-spam. In the table we also add features based on the derivative. For the deriva- tive features, we use the derivative instead of the standard deviation in the previous list. Both the derivative and RAPr features improve the performance of the classifier! It is a small improvement, only a few tenths of a percent in both cases. Using features from the Beta(−0.5, −0.5, [0, 3, 099]) distribution, we obtain the best classification performance. In some sense, this distribution represents the least-likely surfer behavior. In contrast, the actual surfer behav- ior Beta(1.5, 0.5, [0, 0.99]) has the worst performance of all the experiments and fails to improve on the baseline. Unlike many of the other metrics inves- tigated in the baseline performance, there is no tuning of the RAPr metrics for spam ranking. If we combined RAPr and TrustRank, for example, it may be possible to achieve even better performance.PDF Image | Instagram Cheat Sheet
PDF Search Title:
Instagram Cheat SheetOriginal File Name Searched:
pagerank-sensitivity-thesis-online.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)