
PDF Publication Title:
Text from PDF Page: 166
146 7 ⋅ conclusion 7.1.1 Is PageRank research still useful? The death of PageRank has been forecast since 2003 [Zawodny, 2003]. Zawodny claims that the success of PageRank necessarily induces its future failure. Because PageRank utilizes the link structure of the web, it originally produced useful information for web ranking. But, the impact of PageRank on web search caused people to change their link structures to manipulate PageRank. Thus, links on the web will become less reliable over time. It is now 2009, and Google still uses PageRank [Cutts, 2009]. Rumors about its death are greatly exaggerated, apparently. In fact, Cutts [2009] discusses a critical change in the PageRank formu- lation used by Google. The change is that they no longer construct a 0, 1 sub-stochastic matrix from the link structure, but construct a general sub- stochastic matrix instead.1 This change shows that PageRank is still useful to Google, and thus research on it matters. On the other hand, Najork et al. [2007] claimed that PageRank is one of the least effective measures in a machine learning framework for web search. It is worse, incredibly, than a measure based on in-degree. They conjecture that this phenomenon arises from link manipulation to enhance PageRank. Thus, it would seem that they empirically confirm Zawodny’s claim. If this is true, why would Google still use PageRank? We believe that the resolution relies in how Google uses PageRank. Re- cently, Becchetti et al. [2008] show that metrics derived from PageRank are helpful in identifying spam pages. PageRank also helps web crawling op- erations [Lee et al., 2008]. Google supposedly uses PageRank to influence the crawling rate as well [Cutts, 2006]. These two tasks are fundamentally different from determining the order of web search results. PageRank is still useful. 7.1.2 Is picking a distribution for α really helpful? In chapter 4, we argue that the RAPr model suggests an obvious choice for α and the distribution of α: use the values obtained by studying surfers. Crit- ics will object: these choices may not yield the best results for web search or spam detection. We agree, and our web spam analysis supports this objection. Our point is that, regardless of the application, picking a distribution tends to yield more information about the graph. Some of the information is correlated with PageRank (the expectation) and some appears uncorrelated (the standard deviation). Picking a distribution gives us more flexibility to obtain a “best” vector. For our spam example, the best results occurred with a distribution that looks nothing like the empirically measured distribution. In terms of sensitivity, picking a distribution and using the standard devia- tion seems superior to using a sensitivity measure from the derivative. Values of the derivative are difficult to interpret, whereas the standard deviation sensitivity values have a natural probabilistic interpretation. Our advice: Just pick a distribution. 1 Formally, for the new sub-stochastic matrix P ̄ we find that eT P ̄ can be any number between 0 and 1. These column- sums need not be 0 or 1 as in the formu- lation in chapter 2.PDF Image | ALGORITHMS FOR PAGERANK SENSITIVITY DISSERTATION
PDF Search Title:
ALGORITHMS FOR PAGERANK SENSITIVITY DISSERTATIONOriginal File Name Searched:
gleich.pdfDIY PDF Search: Google It | Yahoo | Bing
Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info
Cruising Review Topics and Articles More Info
Software based on Filemaker for the travel industry More Info
The Burgenstock Resort: Reviews on CruisingReview website... More Info
Resort Reviews: World Class resorts... More Info
The Riffelalp Resort: Reviews on CruisingReview website... More Info
| CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP |