
PDF Publication Title:
Text from PDF Page: 185
It is worse, incredibly, than a measure based on in-degree. They conjecture that this phenomenon arises from link manipulation to enhance PageRank. Thus, it would seem that they empirically confirm Zawodny’s claim. If this is true, why would Google still use PageRank? We believe that the resolution relies in how Google uses PageRank. Re- cently, Becchetti et al. [2008] show that metrics derived from PageRank are helpful in identifying spam pages. PageRank also helps web crawling op- erations [Lee et al., 2008]. Google supposedly uses PageRank to influence the crawling rate as well [Cutts, 2006]. These two tasks are fundamentally different from determining the order of web search results. PageRank is still useful. 7.1.2 Is picking a distribution for α really helpful? In chapter 4, we argue that the RAPr model suggests an obvious choice for α and the distribution of α: use the values obtained by studying surfers. Crit- ics will object: these choices may not yield the best results for web search or spam detection. We agree, and our web spam analysis supports this objection. Our point is that, regardless of the application, picking a distribution tends to yield more information about the graph. Some of the information is correlated with PageRank (the expectation) and some appears uncorrelated (the standard deviation). Picking a distribution gives us more flexibility to obtain a “best” vector. For our spam example, the best results occurred with a distribution that looks nothing like the empirically measured distribution. In terms of sensitivity, picking a distribution and using the standard devia- tion seems superior to using a sensitivity measure from the derivative. Values of the derivative are difficult to interpret, whereas the standard deviation sensitivity values have a natural probabilistic interpretation. Our advice: Just pick a distribution. 7.1.3 Why use such a strict tolerance in your computation? In all the PageRank computations throughout the thesis, we computed PageRank vectors with a strict tolerance, typically tighter than 10−10. These vectors are needlessly accurate. Many applications use PageRank vectors with loose tolerances around 10−4 [Kamvar et al., 2003]. We felt that computing PageRank accurately was necessary to distinguish between the effects of our new sensitivity measures and the effects due to inaccurate computations. Many of the PageRank values are small and a few 7.1 ⋅ discussion 163PDF Image | Instagram Cheat Sheet
PDF Search Title:
Instagram Cheat SheetOriginal File Name Searched:
pagerank-sensitivity-thesis-online.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
| CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |